• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 86
  • 9
  • 9
  • 5
  • 4
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 148
  • 148
  • 39
  • 37
  • 36
  • 22
  • 20
  • 18
  • 18
  • 17
  • 17
  • 17
  • 16
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Visual Hierarchical Dimension Reduction

Yang, Jing 09 January 2002 (has links)
Traditional visualization techniques for multidimensional data sets, such as parallel coordinates, star glyphs, and scatterplot matrices, do not scale well to high dimensional data sets. A common approach to solve this problem is dimensionality reduction. Existing dimensionality reduction techniques, such as Principal Component Analysis, Multidimensional Scaling, and Self Organizing Maps, have serious drawbacks in that the generated low dimensional subspace has no intuitive meaning to users. In addition, little user interaction is allowed in those highly automatic processes. In this thesis, we propose a new methodology to dimensionality reduction that combines automation and user interaction for the generation of meaningful subspaces, called the visual hierarchical dimension reduction (VHDR) framework. Firstly, VHDR groups all dimensions of a data set into a dimension hierarchy. This hierarchy is then visualized using a radial space-filling hierarchy visualization tool called Sunburst. Thus users are allowed to interactively explore and modify the dimension hierarchy, and select clusters at different levels of detail for the data display. VHDR then assigns a representative dimension to each dimension cluster selected by the users. Finally, VHDR maps the high-dimensional data set into the subspace composed of these representative dimensions and displays the projected subspace. To accomplish the latter, we have designed several extensions to existing popular multidimensional display techniques, such as parallel coordinates, star glyphs, and scatterplot matrices. These displays have been enhanced to express semantics of the selected subspace, such as the context of the dimensions and dissimilarity among the individual dimensions in a cluster. We have implemented all these features and incorporated them into the XmdvTool software package, which will be released as XmdvTool Version 6.0. Lastly, we developed two case studies to show how we apply VHDR to visualize and interactively explore a high dimensional data set.
42

Statistický model tvaru obličeje / Statistical model of the face shape

Boková, Kateřina January 2019 (has links)
The goal of this thesis is to use machine learning methods for datasets of scanned faces and to create a program that allows to explore and edit faces represented as triangle meshes with a number of controls. Firstly we had to reduce dimension of triangle meshes by PCA and then we tried to predict shape of meshes according to physical properties like weight, height, age and BMI. The modeled faces can be used in animation or games.
43

Improving Feature Selection Techniques for Machine Learning

Tan, Feng 27 November 2007 (has links)
As a commonly used technique in data preprocessing for machine learning, feature selection identifies important features and removes irrelevant, redundant or noise features to reduce the dimensionality of feature space. It improves efficiency, accuracy and comprehensibility of the models built by learning algorithms. Feature selection techniques have been widely employed in a variety of applications, such as genomic analysis, information retrieval, and text categorization. Researchers have introduced many feature selection algorithms with different selection criteria. However, it has been discovered that no single criterion is best for all applications. We proposed a hybrid feature selection framework called based on genetic algorithms (GAs) that employs a target learning algorithm to evaluate features, a wrapper method. We call it hybrid genetic feature selection (HGFS) framework. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for the target algorithm. The experiments on genomic data demonstrate that ours is a robust and effective approach that can find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm. A common characteristic of text categorization tasks is multi-label classification with a great number of features, which makes wrapper methods time-consuming and impractical. We proposed a simple filter (non-wrapper) approach called Relation Strength and Frequency Variance (RSFV) measure. The basic idea is that informative features are those that are highly correlated with the class and distribute most differently among all classes. The approach is compared with two well-known feature selection methods in the experiments on two standard text corpora. The experiments show that RSFV generate equal or better performance than the others in many cases.
44

Directional Control of Generating Brownian Path under Quasi Monte Carlo

Liu, Kai January 2012 (has links)
Quasi-Monte Carlo (QMC) methods are playing an increasingly important role in computational finance. This is attributed to the increased complexity of the derivative securities and the sophistication of the financial models. Simple closed-form solutions for the finance applications typically do not exist and hence numerical methods need to be used to approximate their solutions. QMC method has been proposed as an alternative method to Monte Carlo (MC) method to accomplish this objective. Unlike MC methods, the efficiency of QMC-based methods is highly dependent on the dimensionality of the problems. In particular, numerous researches have documented, under the Black-Scholes models, the critical role of the generating matrix for simulating the Brownian paths. Numerical results support the notion that generating matrix that reduces the effective dimension of the underlying problems is able to increase the efficiency of QMC. Consequently, dimension reduction methods such as principal component analysis, Brownian bridge, Linear Transformation and Orthogonal Transformation have been proposed to further enhance QMC. Motivated by these results, we first propose a new measure to quantify the effective dimension. We then propose a new dimension reduction method which we refer as the directional method (DC). The proposed DC method has the advantage that it depends explicitly on the given function of interest. Furthermore, by assigning appropriately the direction of importance of the given function, the proposed method optimally determines the generating matrix used to simulate the Brownian paths. Because of the flexibility of our proposed method, it can be shown that many of the existing dimension reduction methods are special cases of our proposed DC methods. Finally, many numerical examples are provided to support the competitive efficiency of the proposed method.
45

Computational Methods For Functional Motif Identification and Approximate Dimension Reduction in Genomic Data

Georgiev, Stoyan January 2011 (has links)
<p>Uncovering the DNA regulatory logic in complex organisms has been one of the important goals of modern biology in the post-genomic era. The sequencing of multiple genomes in combination with the advent of DNA microarrays and, more recently, of massively parallel high-throughput sequencing technologies has made possible the adoption of a global perspective to the inference of the regulatory rules governing the context-specific interpretation of the genetic code that complements the more focused classical experimental approaches. Extracting useful information and managing the complexity resulting from the sheer volume and the high-dimensionality of the data produced by these genomic assays has emerged as a major challenge which we attempt to address in this work by developing computational methods and tools, specifically designed for the study of the gene regulatory processes in this new global genomic context. </p><p>First, we focus on the genome-wide discovery of physical interactions between regulatory sequence regions and their cognate proteins at both the DNA and RNA level. We present a motif analysis framework that leverages the genome-wide</p><p>evidence for sequence-specific interactions between trans-acting factors and their preferred cis-acting regulatory regions. The utility of the proposed framework is demonstarted on DNA and RNA cross-linking high-throughput data.</p><p>A second goal of this thesis is the development of scalable approaches to dimension reduction based on spectral decomposition and their application to the study of population structure in massive high-dimensional genetic data sets. We have developed computational tools and have performed theoretical and empirical analyses of their statistical properties with particular emphasis on the analysis of the individual genetic variation measured by Single Nucleotide Polymorphism (SNP) microrarrays.</p> / Dissertation
46

Analysis of Modeling, Training, and Dimension Reduction Approaches for Target Detection in Hyperspectral Imagery

Farrell, Michael D., Jr. 03 November 2005 (has links)
Whenever a new sensor or system comes online, engineers and analysts responsible for processing the measured data turn first to methods that are tried and true on existing systems. This is a natural, if not wholly logical approach, and is exactly what has happened in the advent of hyperspectral imagery (HSI) exploitation. However, a closer look at the assumptions made by the approaches published in the literature has not been undertaken. This thesis analyzes three key aspects of HSI exploitation: statistical data modeling, covariance estimation from training data, and dimension reduction. These items are part of standard processing schemes, and it is worthwhile to understand and quantify the impact that various assumptions for these items have on target detectability and detection statistics. First, the accuracy and applicability of the standard Gaussian (i.e., Normal) model is evaluated, and it is shown that the elliptically contoured t-distribution (EC-t) sometimes offers a better statistical model for HSI data. A finite mixture approach for EC-t is developed in which all parameters are estimated simultaneously without a priori information. Then the effects of making a poor covariance estimate are shown by including target samples in the training data. Multiple test cases with ground targets are explored. They show that the magnitude of the deleterious effect of covariance contamination on detection statistics depends on algorithm type and target signal characteristics. Next, the two most widely used dimension reduction approaches are tested. It is demonstrated that, in many cases, significant dimension reduction can be achieved with only a minor loss in detection performance. In addition, a concise development of key HSI detection algorithms is presented, and the state-of-the-art in adaptive detectors is benchmarked for land mine targets. Methods for detection and identification of airborne gases using hyperspectral imagery are discussed, and this application is highlighted as an excellent opportunity for future work.
47

Feature Reduction and Multi-label Classification Approaches for Document Data

Jiang, Jung-Yi 08 August 2011 (has links)
This thesis proposes some novel approaches for feature reduction and multi-label classification for text datasets. In text processing, the bag-of-words model is commonly used, with each document modeled as a vector in a high dimensional space. This model is often called the vector-space model. Usually, the dimensionality of the document vector is huge. Such high-dimensionality can be a severe obstacle for text processing algorithms. To improve the performance of text processing algorithms, we propose a feature clustering approach to reduce the dimensionality of document vectors. We also propose an efficient algorithm for text classification. Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. We propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature corresponding to a cluster is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods. We also propose a fuzzy similarity clustering scheme for multi-label text categorization in which a document can belong to one or more than one category. Firstly, feature transformation is performed. An input document is transformed to a fuzzy-similarity vector. Next, the relevance degrees of the input document to a collection of clusters are calculated, which are then combined to obtain the relevance degree of the input document to each participating category. Finally, the input document is classified to a certain category if the associated relevance degree exceeds a threshold. In text categorization, the number of the involved terms is usually huge. An automatic classification system may suffer from large memory requirements and poor efficiency. Our scheme can do without these difficulties. Besides, we allow the region a category covers to be a combination of several sub-regions that are not necessarily connected. The effectiveness of our proposed scheme is demonstrated by the results of several experiments.
48

Principal Components Analysis for Binary Data

Lee, Seokho 2009 May 1900 (has links)
Principal components analysis (PCA) has been widely used as a statistical tool for the dimension reduction of multivariate data in various application areas and extensively studied in the long history of statistics. One of the limitations of PCA machinery is that PCA can be applied only to the continuous type variables. Recent advances of information technology in various applied areas have created numerous large diverse data sets with a high dimensional feature space, including high dimensional binary data. In spite of such great demands, only a few methodologies tailored to such binary dataset have been suggested. The methodologies we developed are the model-based approach for generalization to binary data. We developed a statistical model for binary PCA and proposed two stable estimation procedures using MM algorithm and variational method. By considering the regularization technique, the selection of important variables is automatically achieved. We also proposed an efficient algorithm for model selection including the choice of the number of principal components and regularization parameter in this study.
49

Functional data analysis: classification and regression

Lee, Ho-Jin 01 November 2005 (has links)
Functional data refer to data which consist of observed functions or curves evaluated at a finite subset of some interval. In this dissertation, we discuss statistical analysis, especially classification and regression when data are available in function forms. Due to the nature of functional data, one considers function spaces in presenting such type of data, and each functional observation is viewed as a realization generated by a random mechanism in the spaces. The classification procedure in this dissertation is based on dimension reduction techniques of the spaces. One commonly used method is Functional Principal Component Analysis (Functional PCA) in which eigen decomposition of the covariance function is employed to find the highest variability along which the data have in the function space. The reduced space of functions spanned by a few eigenfunctions are thought of as a space where most of the features of the functional data are contained. We also propose a functional regression model for scalar responses. Infinite dimensionality of the spaces for a predictor causes many problems, and one such problem is that there are infinitely many solutions. The space of the parameter function is restricted to Sobolev-Hilbert spaces and the loss function, so called, e-insensitive loss function is utilized. As a robust technique of function estimation, we present a way to find a function that has at most e deviation from the observed values and at the same time is as smooth as possible.
50

Hypothesis Testing in GWAS and Statistical Issues with Compensation in Clinical Trials

Swanson, David Michael 27 September 2013 (has links)
We first show theoretically and in simulation how power varies as a function of SNP correlation structure with currently-implemented gene-based testing methods. We propose alternative testing methods whose power does not vary with the correlation structure. We then propose hypothesis tests for detecting prevalence-incidence bias in case-control studies, a bias perhaps overrepresented in GWAS due to currently used study designs. Lastly, we hypothesize how different incentive structures used to keep clinical trial participants in studies may interact with a background of dependent censoring and result in variation in the bias of the Kaplan-Meier survival curve estimator.

Page generated in 0.1401 seconds