Global ETD Search

51	Functional data analysis: classification and regression Lee, Ho-Jin 01 November 2005 (has links) Functional data refer to data which consist of observed functions or curves evaluated at a finite subset of some interval. In this dissertation, we discuss statistical analysis, especially classification and regression when data are available in function forms. Due to the nature of functional data, one considers function spaces in presenting such type of data, and each functional observation is viewed as a realization generated by a random mechanism in the spaces. The classification procedure in this dissertation is based on dimension reduction techniques of the spaces. One commonly used method is Functional Principal Component Analysis (Functional PCA) in which eigen decomposition of the covariance function is employed to find the highest variability along which the data have in the function space. The reduced space of functions spanned by a few eigenfunctions are thought of as a space where most of the features of the functional data are contained. We also propose a functional regression model for scalar responses. Infinite dimensionality of the spaces for a predictor causes many problems, and one such problem is that there are infinitely many solutions. The space of the parameter function is restricted to Sobolev-Hilbert spaces and the loss function, so called, e-insensitive loss function is utilized. As a robust technique of function estimation, we present a way to find a function that has at most e deviation from the observed values and at the same time is as smooth as possible. Functional data Support Vector Machine Principal component analysis Functional regression Dimension reduction
52	Hypothesis Testing in GWAS and Statistical Issues with Compensation in Clinical Trials Swanson, David Michael 27 September 2013 (has links) We first show theoretically and in simulation how power varies as a function of SNP correlation structure with currently-implemented gene-based testing methods. We propose alternative testing methods whose power does not vary with the correlation structure. We then propose hypothesis tests for detecting prevalence-incidence bias in case-control studies, a bias perhaps overrepresented in GWAS due to currently used study designs. Lastly, we hypothesize how different incentive structures used to keep clinical trial participants in studies may interact with a background of dependent censoring and result in variation in the bias of the Kaplan-Meier survival curve estimator. Biostatistics Epidemiology bias compensation dependent censoring dimension-reduction GWAS U-statistic
53	Evidence-Based Hospitals Bardach, David R 01 January 2015 (has links) In 2011 the University of Kentucky opened the first two inpatient floors of its new hospital. With an estimated cost of over $872 million, the new facility represents a major investment in the future of healthcare in Kentucky. This facility is outfitted with many features that were not present in the old hospital, with the expectation that they would improve the quality and efficiency of patient care. After one year of occupancy, hospital administration questioned the effectiveness of some features. Through focus groups of key stakeholders, surveys of frontline staff, and direct observational data, this dissertation evaluates the effectiveness of two such features, namely the ceiling-based patient lifts and the placement of large team meeting spaces on every unit, while also describing methods that can improve the overall state of quality improvement research in healthcare. Bayesian models dimension reduction quality improvement patient safety teamwork Health Services Research
54	Directional Control of Generating Brownian Path under Quasi Monte Carlo Liu, Kai January 2012 (has links) Quasi-Monte Carlo (QMC) methods are playing an increasingly important role in computational finance. This is attributed to the increased complexity of the derivative securities and the sophistication of the financial models. Simple closed-form solutions for the finance applications typically do not exist and hence numerical methods need to be used to approximate their solutions. QMC method has been proposed as an alternative method to Monte Carlo (MC) method to accomplish this objective. Unlike MC methods, the efficiency of QMC-based methods is highly dependent on the dimensionality of the problems. In particular, numerous researches have documented, under the Black-Scholes models, the critical role of the generating matrix for simulating the Brownian paths. Numerical results support the notion that generating matrix that reduces the effective dimension of the underlying problems is able to increase the efficiency of QMC. Consequently, dimension reduction methods such as principal component analysis, Brownian bridge, Linear Transformation and Orthogonal Transformation have been proposed to further enhance QMC. Motivated by these results, we first propose a new measure to quantify the effective dimension. We then propose a new dimension reduction method which we refer as the directional method (DC). The proposed DC method has the advantage that it depends explicitly on the given function of interest. Furthermore, by assigning appropriately the direction of importance of the given function, the proposed method optimally determines the generating matrix used to simulate the Brownian paths. Because of the flexibility of our proposed method, it can be shown that many of the existing dimension reduction methods are special cases of our proposed DC methods. Finally, many numerical examples are provided to support the competitive efficiency of the proposed method. QMC Low Discrepancy Sequence Effective Dimension Dimension Reduction PCA BB LT OT FOT DC Actuarial Science
55	Réduction de dimension via Sliced Inverse Regression : Idées et nouvelles propositions / Dimension reductio via Sliced Inverse Regression : ideas and extensions Chiancone, Alessandro 28 October 2016 (has links) Cette thèse propose trois extensions de la Régression linéaire par tranches (Sliced Inverse Regression, SIR), notamment Collaborative SIR, Student SIR et Knockoff SIR.Une des faiblesses de la méthode SIR est l’impossibilité de vérifier si la Linearity Design Condition (LDC) est respectée. Il est établi que, si x suit une distribution elliptique, la condition est vraie ; dans le cas d’une composition de distributions elliptiques il n y a aucune garantie que la condition soit vérifiée globalement, pourtant, elle est respectée localement.On va donc proposer une extension sur la base de cette considération. Étant donné une variable explicative x, Collaborative SIR réalise d’abord un clustering. Pour chaque cluster, la méthode SIR est appliquée de manière indépendante.Le résultat de chaque composant contribue à créer la solution finale.Le deuxième papier, Student SIR, dérive de la nécessité de robustifier la méthode SIR.Vu que cette dernière repose sur l’estimation de la covariance et contient une étape APC, alors elle est sensible au bruit.Afin d’étendre la méthode SIR on a utilisé une stratégie fondée sur une formulation inverse du SIR, proposée par R.D. Cook.Finalement, Knockoff SIR est une extension de la méthode SIR pour la sélection des variables et la recherche d’une solution sparse, ayant son fondement dans le papier publié par R.F. Barber et E.J. Candès qui met l’accent sur le false discovery rate dans le cadre de la régression. L’idée sous-jacente à notre papier est de créer des copies de variables d’origine ayant certaines proprietés.On va montrer que la méthode SIR est robuste par rapport aux copies et on va proposer une stratégie pour utiliser les résultats dans la sélection des variables et pour générer des solutions sparse / This thesis proposes three extensions of Sliced Inverse Regression namely: Collaborative SIR, Student SIR and Knockoff SIR.One of the weak points of SIR is the impossibility to check if the Linearity Design Condition (LDC) holds. It is known that if X follows an elliptic distribution thecondition holds true, in case of a mixture of elliptic distributions there are no guaranties that the condition is satisfied globally, but locally holds. Starting from this consideration an extension is proposed. Given the predictor variable X, Collaborative SIR performs initially a clustering. In each cluster, SIR is applied independently. The result from each component collaborates to give the final solution.Our second contribution, Student SIR, comes from the need to robustify SIR. Since SIR is based on the estimation of the covariance, and contains a PCA step, it is indeed sensitive to noise. To extend SIR, an approach based on a inverse formulation of SIR proposed by R.D. Cook has been used.Finally Knockoff SIR is an extension of SIR to perform variable selection and give sparse solution that has its foundations in a recently published paper by R. F. Barber and E. J. Candès that focuses on the false discovery rate in the regression framework. The underlying idea of this paper is to construct copies of the original variables that have some properties. It is shown that SIR is robust to this copies and a strategy is proposed to use this result for variable selection and to generate sparse solutions. Régression linéaire par tranches Reduction de dimension Selection de variables Sliced Inverse Regression Dimension reduction Variable selection 510
56	Probabilistic Topic Models for Human Emotion Analysis January 2015 (has links) abstract: While discrete emotions like joy, anger, disgust etc. are quite popular, continuous emotion dimensions like arousal and valence are gaining popularity within the research community due to an increase in the availability of datasets annotated with these emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle and complex affect dimensions but are difficult to predict. Dimension reduction techniques form the core of emotion recognition systems and help create a new feature space that is more helpful in predicting emotions. But these techniques do not necessarily guarantee a better predictive capability as most of them are unsupervised, especially in regression learning. In emotion recognition literature, supervised dimension reduction techniques have not been explored much and in this work a solution is provided through probabilistic topic models. Topic models provide a strong probabilistic framework to embed new learning paradigms and modalities. In this thesis, the graphical structure of Latent Dirichlet Allocation has been explored and new models tuned to emotion recognition and change detection have been built. In this work, it has been shown that the double mixture structure of topic models helps 1) to visualize feature patterns, and 2) to project features onto a topic simplex that is more predictive of human emotions, when compared to popular techniques like PCA and KernelPCA. Traditionally, topic models have been used on quantized features but in this work, a continuous topic model called the Dirichlet Gaussian Mixture model has been proposed. Evaluation of DGMM has shown that while modeling videos, performance of LDA models can be replicated even without quantizing the features. Until now, topic models have not been explored in a supervised context of video analysis and thus a Regularized supervised topic model (RSLDA) that models video and audio features is introduced. RSLDA learning algorithm performs both dimension reduction and regularized linear regression simultaneously, and has outperformed supervised dimension reduction techniques like SPCA and Correlation based feature selection algorithms. In a first of its kind, two new topic models, Adaptive temporal topic model (ATTM) and SLDA for change detection (SLDACD) have been developed for predicting concept drift in time series data. These models do not assume independence of consecutive frames and outperform traditional topic models in detecting local and global changes respectively. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2015 Computer science Arousal and Valence Change Detection Dimension Reduction Emotion Recognition Regularized Topic Model Topic Models
57	An Application of Dimension Reduction for Intention Groups in Reddit Sun, Xuebo, Wang, Yudan January 2016 (has links) Reddit (www.reddit.com) is a social news platform for information sharing and exchanging. The amount of data, in terms of both observations and dimensions is enormous because a large number of users express all aspects of knowledge in their own lives by publishing the comments. While it’s easy for a human being to understand the Reddit comments on an individual basis, it is a tremendous challenge to use a computer and extract insights from it. In this thesis, we seek one algorithmic driven approach to analyze both the unique Reddit data structure and the relations inside owners of comments by their similar features. We explore the various types of communications between two people with common characteristics and build a special communication model that characterizes the potential relationship between two users via their communication messages. We then seek a dimensionality reduction methodology that can merge users with similar behavior into same groups. Along the process, we develop computer program to collect data, define attributes based on the communication model and apply a rule-based group merging algorithm. We then evaluate the results to show the effectiveness of this methodology. Our results show reasonable success in producing user groups that have recognizable group characteristics and share similar intentions. Reddit communication model dimension reduction similarity metric Computer Sciences Datavetenskap (datalogi)
58	Contribution to dimension reduction techniques : application to object tracking / Contribution aux techniques de la réduction de dimension : application au suivi d'objet Lu, Weizhi 16 July 2014 (has links) Cette thèse étudie et apporte des améliorations significatives sur trois techniques répandues en réduction de dimension : l'acquisition parcimonieuse (ou l'échantillonnage parcimonieux), la projection aléatoire et la représentation parcimonieuse. En acquisition parcimonieuse, la construction d’une matrice de réduction possédant à la fois de bonnes performances et une structure matérielle adéquate reste un défi de taille. Ici, nous proposons explicitement la matrice binaire optimale, avec éléments zéro-Un, en recherchant la meilleure propriété d’isométrie restreinte (RIP). Dans la pratique, un algorithme glouton efficace est successivement développé pour construire la matrice binaire optimale avec une taille arbitraire. Par ailleurs, nous étudions également un autre problème intéressant pour l'acquisition parcimonieuse, c'est celui de la performance des matrices d'acquisition parcimonieuse avec des taux de compression élevés. Pour la première fois, la limite inférieure de la performance des matrices aléatoires de Bernoulli pour des taux de compression croissants est observée et estimée. La projection aléatoire s'utilise principalement en classification mais la construction de la matrice de projection aléatoire s'avère également critique en termes de performance et de complexité. Cette thèse présente la matrice de projection aléatoire, de loin, la plus éparse. Celle-Ci est démontrée présenter la meilleure performance en sélection de caractéristiques, comparativement à d’autres matrices aléatoires plus denses. Ce résultat théorique est confirmé par de nombreuses expériences. Comme nouvelle technique pour la sélection de caractéristiques ou d’échantillons, la représentation parcimonieuse a récemment été largement appliquée dans le domaine du traitement d'image. Dans cette thèse, nous nous concentrons principalement sur ses applications de suivi d'objets dans une séquence d'images. Pour réduire la charge de calcul liée à la représentation parcimonieuse, un système simple mais efficace est proposé pour le suivi d'un objet unique. Par la suite, nous explorons le potentiel de cette représentation pour le suivi d'objets multiples. / This thesis studies three popular dimension reduction techniques: compressed sensing, random projection and sparse representation, and brings significant improvements on these techniques. In compressed sensing, the construction of sensing matrix with both good performance and hardware-Friendly structure has been a significant challenge. In this thesis, we explicitly propose the optimal zero-One binary matrix by searching the best Restricted Isometry Property. In practice, an efficient greedy algorithm is successively developed to construct the optimal binary matrix with arbitrary size. Moreover, we also study another interesting problem for compressed sensing, that is the performance of sensing matrices with high compression rates. For the first time, the performance floor of random Bernoulli matrices over increasing compression rates is observed and effectively estimated. Random projection is mainly used in the task of classification, for which the construction of random projection matrix is also critical in terms of both performance and complexity. This thesis presents so far the most sparse random projection matrix, which is proved holding better feature selection performance than other more dense random matrices. The theoretical result is confirmed with extensive experiments. As a novel technique for feature or sample selection, sparse representation has recently been widely applied in the area of image processing. In this thesis, we mainly focus our attention on its applications to visual object tracking. To reduce the computation load related to sparse representation, a simple but efficient scheme is proposed for the tracking of single object. Subsequently, the potential of sparse representation to multiobject tracking is investigated. Réduction de dimension Dimension reduction Compressed sensing Random projection Sparse representation 621.382
59	BAYESIAN DYNAMIC FACTOR ANALYSIS AND COPULA-BASED MODELS FOR MIXED DATA Safari Katesari, Hadi 01 September 2021 (has links) Available statistical methodologies focus more on accommodating continuous variables, however recently dealing with count data has received high interest in the statistical literature. In this dissertation, we propose some statistical approaches to investigate linear and nonlinear dependencies between two discrete random variables, or between a discrete and continuous random variables. Copula functions are powerful tools for modeling dependencies between random variables. We derive copula-based population version of Spearman’s rho when at least one of the marginal distribution is discrete. In each case, the functional relationship between Kendall’s tau and Spearman’s rho is obtained. The asymptotic distributions of the proposed estimators of these association measures are derived and their corresponding confidence intervals are constructed, and tests of independence are derived. Then, we propose a Bayesian copula factor autoregressive model for time series mixed data. This model assumes conditional independence and shares latent factors in both mixed-type response and multivariate predictor variables of the time series through a quadratic timeseries regression model. This model is able to reduce the dimensionality by accommodating latent factors in both response and predictor variables of the high-dimensional time series data. A semiparametric time series extended rank likelihood technique is applied to the marginal distributions to handle mixed-type predictors of the high-dimensional time series, which decreases the number of estimated parameters and provides an efficient computational algorithm. In order to update and compute the posterior distributions of the latent factors and other parameters of the models, we propose a naive Bayesian algorithm with Metropolis-Hasting and Forward Filtering Backward Sampling methods. We evaluate the performance of the proposed models and methods through simulation studies. Finally, each proposed model is applied to a real dataset. Asymptotic variance Bayesian inference Copula and dependecy models Dimension reduction Mixed data analysis Multivariate time series
60	Efficient Uncertainty quantification with high dimensionality Jianhua Yin (12456819) 25 April 2022 (has links) <p>Uncertainty exists everywhere in scientific and engineering applications. To avoid potential risk, it is critical to understand the impact of uncertainty on a system by performing uncertainty quantification (UQ) and reliability analysis (RA). However, the computational cost may be unaffordable using current UQ methods with high-dimensional input. Moreover, current UQ methods are not applicable when numerical data and image data coexist. </p> <p>To decrease the computational cost to an affordable level and enable UQ with special high dimensional data (e.g. image), this dissertation develops three UQ methodologies with high dimensionality of input space. The first two methods focus on high-dimensional numerical input. The core strategy of Methodology 1 is fixing the unimportant variables at their first step most probable point (MPP) so that the dimensionality is reduced. An accurate RA method is used in the reduced space. The final reliability is obtained by accounting for the contributions of important and unimportant variables. Methodology 2 addresses the issue that the dimensionality cannot be reduced when most of the variables are important or when variables equally contribute to the system. Methodology 2 develops an efficient surrogate modeling method for high dimensional UQ using Generalized Sliced Inverse Regression (GSIR), Gaussian Process (GP)-based active learning, and importance sampling. A cost-efficient GP model is built in the latent space after dimension reduction by GSIR. And the failure boundary is identified through active learning that adds optimal training points iteratively. In Methodology 3, a Convolutional Neural Networks (CNN) based surrogate model (CNN-GP) is constructed for dealing with mixed numerical and image data. The numerical data are first converted into images and the converted images are then merged with existing image data. The merged images are fed to CNN for training. Then, we use the latent variables of the CNN model to integrate CNN with GP to quantify the model error using epistemic uncertainty. Both epistemic uncertainty and aleatory uncertainty are considered in uncertainty propagation. </p> <p>The simulation results indicate that the first two methodologies can not only improve the efficiency but also maintain adequate accuracy for the problems with high-dimensional numerical input. GSIR with active learning can handle the situations that the dimensionality cannot be reduced when most of the variables are important or the importance of variables are close. The two methodologies can be combined as a two-stage dimension reduction for high-dimensional numerical input. The third method, CNN-GP, is capable of dealing with special high-dimensional input, mixed numerical and image data, with the satisfying regression accuracy and providing an estimate of the model error. Uncertainty propagation considering both epistemic uncertainty and aleatory uncertainty provides better accuracy. The proposed methods could be potentially applied to engineering design and decision making. </p> Mechanical Engineering Uncertainty Quantification Reliability Analysis Dimension Reduction Computational Cost Reduction High Dimensional Data Uncertainty

Search results