Global ETD Search

91	Statistical Inferences under a semiparametric finite mixture model Zhang, Shiju January 2005 (has links) No description available. Statistics biased sampling, EM algorithm, empirical likelihood, finite mixture model, goodness-of-fit test, partial likelihood
92	Extending Growth Mixture Models and Handling Missing Values via Mixtures of Non-Elliptical Distributions Wei, Yuhong January 2017 (has links) Growth mixture models (GMMs) are used to model intra-individual change and inter-individual differences in change and to detect underlying group structure in longitudinal studies. Regularly, these models are fitted under the assumption of normality, an assumption that is frequently invalid. To this end, this thesis focuses on the development of novel non-elliptical growth mixture models to better fit real data. Two non-elliptical growth mixture models, via the multivariate skew-t distribution and the generalized hyperbolic distribution, are developed and applied to simulated and real data. Furthermore, these two non-elliptical growth mixture models are extended to accommodate missing values, which are near-ubiquitous in real data. Recently, finite mixtures of non-elliptical distributions have flourished and facilitated the flexible clustering of the data featuring longer tails and asymmetry. However, in practice, real data often have missing values, and so work in this direction is also pursued. A novel approach, via mixtures of the generalized hyperbolic distribution and mixtures of the multivariate skew-t distributions, is presented to handle missing values in mixture model-based clustering context. To increase parsimony, families of mixture models have been developed by imposing constraints on the component scale matrices whenever missing data occur. Next, a mixture of generalized hyperbolic factor analyzers model is also proposed to cluster high-dimensional data with different patterns of missing values. Two missingness indicator matrices are also introduced to ease the computational burden. The algorithms used for parameter estimation are presented, and the performance of the methods is illustrated on simulated and real data. / Thesis / Doctor of Philosophy (PhD) Growth Mixture Model Model-Based Clustering EM Algorithm Missing Data Finite Mixture Models
93	On Clustering: Mixture Model Averaging with the Generalized Hyperbolic Distribution Ricciuti, Sarah 11 1900 (has links) Cluster analysis is commonly described as the classification of unlabeled observations into groups such that they are more similar to one another than to observations in other groups. Model-based clustering assumes that the data arise from a statistical (mixture) model and typically a group of many models are fit to the data, from which the `best' model is selected by a model selection criterion (often the BIC in mixture model applications). This chosen model is then the only model that is used for making inferences on the data. Although this is common practice, proceeding in this way ignores a large component of model selection uncertainty, especially for situations where the difference between the model selection criterion for two competing models is relatively insignificant. For this reason, recent interest has been placed on selecting a subset of models that are close to the selected best model and using a weighted averaging approach to incorporate information from multiple models in this set. Model averaging is not a novel approach, yet its presence in a clustering framework is minimal. Here, we use Occam's window to select a subset of models eligible for two types of averaging techniques: averaging a posteriori probabilities, and direct averaging of model parameters. The efficacy of these model-based averaging approaches is demonstrated for a family of generalized hyperbolic mixture models using real and simulated data. / Thesis / Master of Science (MSc) clustering finite mixture model model averaging generalized hyperbolic distribution Occam's window Bayesian model averaging Statistics
94	Statistical Methods for Variability Management in High-Performance Computing Xu, Li 15 July 2021 (has links) High-performance computing (HPC) variability management is an important topic in computer science. Research topics include experimental designs for efficient data collection, surrogate models for predicting the performance variability, and system configuration optimization. Due to the complex architecture of HPC systems, a comprehensive study of HPC variability needs large-scale datasets, and experimental design techniques are useful for improved data collection. Surrogate models are essential to understand the variability as a function of system parameters, which can be obtained by mathematical and statistical models. After predicting the variability, optimization tools are needed for future system designs. This dissertation focuses on HPC input/output (I/O) variability through three main chapters. After the general introduction in Chapter 1, Chapter 2 focuses on the prediction models for the scalar description of I/O variability. A comprehensive comparison study is conducted, and major surrogate models for computer experiments are investigated. In addition, a tool is developed for system configuration optimization based on the chosen surrogate model. Chapter 3 conducts a detailed study for the multimodal phenomena in I/O throughput distribution and proposes an uncertainty estimation method for the optimal number of runs for future experiments. Mixture models are used to identify the number of modes for throughput distributions at different configurations. This chapter also addresses the uncertainty in parameter estimation and derives a formula for sample size calculation. The developed method is then applied to HPC variability data. Chapter 4 focuses on the prediction of functional outcomes with both qualitative and quantitative factors. Instead of a scalar description of I/O variability, the distribution of I/O throughput provides a comprehensive description of I/O variability. We develop a modified Gaussian process for functional prediction and apply the developed method to the large-scale HPC I/O variability data. Chapter 5 contains some general conclusions and areas for future work. / Doctor of Philosophy / This dissertation focuses on three projects that are all related to statistical methods in performance variability management in high-performance computing (HPC). HPC systems are computer systems that create high performance by aggregating a large number of computing units. The performance of HPC is measured by the throughput of a benchmark called the IOZone Filesystem Benchmark. The performance variability is the variation among throughputs when the system configuration is fixed. Variability management involves studying the relationship between performance variability and the system configuration. In Chapter 2, we use several existing prediction models to predict the standard deviation of throughputs given different system configurations and compare the accuracy of predictions. We also conduct HPC system optimization using the chosen prediction model as the objective function. In Chapter 3, we use the mixture model to determine the number of modes in the distribution of throughput under different system configurations. In addition, we develop a model to determine the number of additional runs for future benchmark experiments. In Chapter 4, we develop a statistical model that can predict the throughout distributions given the system configurations. We also compare the prediction of summary statistics of the throughput distributions with existing prediction models. computer experiments functional prediction Gaussian process Machine learning prediction model performance variability mixture model quantile regression
95	Statistical Analysis of Gene Expression Profile: Transcription Network Inference and Sample Classification Bing, Nan 21 April 2004 (has links) The copious information generated from transcriptomes gives us an opportunity to learn biological processes as integrated systems; however, due to numerous sources of variation, high dimensions of data structure, various levels of data quality, and different formats of the inputs, dissecting and interpreting such data presents daunting challenges to scientists. The goal of this research is to provide improved and new statistical tools for analyzing transcriptomes data to identify gene expression patterns for classifying samples, to discover regulatory gene networks using natural genetic perturbations, to develop statistical methods for model fitting and comparison of biochemical networks, and eventually to advance our capability to understand the principles of biological processes at the system level. / Ph. D. Structural Equation Model Classification Mixture Model Gene Network Genetical Genomics Microarray
96	Speaker Identification and Verification Using Line Spectral Frequencies Raman, Pujita 17 June 2015 (has links) State-of-the-art speaker identification and verification (SIV) systems provide near perfect performance under clean conditions. However, their performance deteriorates in the presence of background noise. Many feature compensation, model compensation and signal enhancement techniques have been proposed to improve the noise-robustness of SIV systems. Most of these techniques require extensive training, are computationally expensive or make assumptions about the noise characteristics. There has not been much focus on analyzing the relative importance, or speaker-discriminative power of different speech zones, particularly under noisy conditions. In this work, an automatic, text-independent speaker identification (SI) system and speaker verification (SV) system is proposed using Line Spectral Frequency (LSF) features. The performance of the proposed SI and SV systems are evaluated under various types of background noise. A score-level fusion based technique is implemented to extract complementary information from static and dynamic LSF features. The proposed score-level fusion based SI and SV systems are found to be more robust under noisy conditions. In addition, we investigate the speaker-discriminative power of different speech zones such as vowels, non-vowels and transitions. Rapidly varying regions of speech such as consonant-vowel transitions are found to be most speaker-discriminative in high SNR conditions. Steady, high-energy vowel regions are robust against noise and are hence most speaker-discriminative in low SNR conditions. We show that selectively utilizing features from a combination of transition and steady vowel zones further improves the performance of the score-level fusion based SI and SV systems under noisy conditions. / Master of Science Speech Speaker Noise Identification Verification Recognition Feature Line Spectral Frequency Gaussian Mixture Model Transition Vowel
97	Semiparametric Bayesian Kernel Survival Model for Highly Correlated High-Dimensional Data. Zhang, Lin 01 May 2018 (has links) We are living in an era in which many mysteries related to science, technologies and design can be answered by "learning" the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. We define a "set" as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an "element" as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using Bayesian survival kernel models. By connecting kernel machines with semiparametric Bayesian hierarchical models, the proposed unified model frameworks can identify significant elements as well as sets regardless of mis-specifications of distributions or kernels. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems. / PHD Gaussian Process Kernel Machine Mixture Model Pathway-Based Analysis
98	Domain Adaptation with a Classifier Trained by Robust Pseudo-Labels Zhou, Yunke 07 January 2022 (has links) With the rapid growth of computing power, approaches based on deep learning algorithms have achieved remarkable results in solving computer vision classification problems. These performance improvements are achieved by assuming the source and target data are collected from the same probability distribution. However, this assumption is usually too strict to be satisfied in many real-world applications, such as big data analysis, natural language processing, and computer vision classification problems. Because of distribution discrepancies between these domains, directly training the model on the source domain cannot be expected to generate satisfactory results on the target domain. Therefore, the problem of minimizing these data distribution discrepancies is the main challenge with which modern machine learning is now faced. To address this problem, domain adaptation (DA) aims to identify domain-invariant features between two different but related domains. This thesis proposes a state-of-the-art DA approach that overcomes the limitations of traditional DA methods. To capture fine-grained information for each category, I deploy centroid-to-centroid alignment to perform domain adaptation. An Exponential Moving Average strategy (EMA) is used to ensure we can form robust source and target centroids. A Gaussian-uniform mixture model is trained using an Expectation-Maximization (EM) algorithm to infer the robustness of the target pseudo-labels. With the help of target pseudo-labels, I propose two novel types of classifiers: (1) a target-oriented classifier (TO); and (2) a centroid-oriented classifier (CO). Extensive experiments show that these two classifiers exhibit superior performance on a variety of DA benchmarks when compared to standard baseline methods. / Master of Science / Approaches based on deep learning algorithms have achieved remarkable results in solving computer vision classification problems. These performance improvements are achieved by assuming the source and target data are collected from the same probability distribution; however, in many real-world applications, such as big data analysis, natural language processing, and computer vision classification problems, this assumption is usually too strict to be satisfied. For example, these two domains may have the same types of classes, but the objects in each category of these different domains can vary in shape, color, background, or even illumination. Because the probability distributions are slightly mismatched, directly training the model on one domain cannot achieve a satisfactory result on the other domain. To address this problem, domain adaptation (DA) aims to extract common features on both domains to transfer knowledge from one domain to another. In this thesis, I propose a state-of-the-art DA approach that overcomes the limitation of the traditional DA methods. To capture the low-level information of each category, I deploy centroid-to-centroid alignment to perform domain adaptation. An Exponential Moving Average (EMA) strategy is used to ensure the generation of robust centroids. A Gaussian-Uniform Mixture model is trained by using the Expectation-Maximization (EM) algorithm to infer the robustness of the target sample pseudo-labels. With the help of robust target pseudo-labels, I propose two novel types of classifiers: (1) a target-oriented classifier (TO); and (2) a centroid-oriented classifier (CO). Extensive experiments show that the proposed method outperforms traditional baseline methods on various DA benchmarks. Robust pseudo-label Gaussian-uniform mixture model Close-set unsupervised domain adaptation
99	Application of Time Series Analysis in Video Background Subtraction Cai, Yicheng January 2024 (has links) This thesis aims to give statistical methods applicating to video background subtraction. In the thesis, I will give out the problem introduction and analyze the problem with different statistical methods including histogram statistics, and Gaussian Mixture models methods. To study further, I will give out the time series analysis to make a more significant way: To build up the time series analysis way of video background subtraction with the Kalman filter and give out the predictions and evaluations. Background Subtraction Statistics Time Series Analysis Gaussian Mixture Model Kalman Filter Probability Theory and Statistics Sannolikhetsteori och statistik
100	Bitrate Reduction Techniques for Low-Complexity Surveillance Video Coding Gorur, Pushkar January 2016 (has links) (PDF) High resolution surveillance video cameras are invaluable resources for effective crime prevention and forensic investigations. However, increasing communication bandwidth requirements of high definition surveillance videos are severely limiting the number of cameras that can be deployed. Higher bitrate also increases operating expenses due to higher data communication and storage costs. Hence, it is essential to develop low complexity algorithms which reduce data rate of the compressed video stream without affecting the image fidelity. In this thesis, a computer vision aided H.264 surveillance video encoder and four associated algorithms are proposed to reduce the bitrate. The proposed techniques are (I) Speeded up foreground segmentation, (II) Skip decision, (III) Reference frame selection and (IV) Face Region-of-Interest (ROI) coding. In the first part of the thesis, a modification to the adaptive Gaussian Mixture Model (GMM) based foreground segmentation algorithm is proposed to reduce computational complexity. This is achieved by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we compute periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal. In the second part, we propose a skip decision technique that uses a spatial sampler to sample pixels. The sampled pixels are segmented using the speeded up GMM algorithm. The storage pattern of the GMM parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. In the third part, a reference frame selection algorithm is proposed to maximize the number of background Macroblocks (MB’s) (i.e. MB’s that contain background image content) in the Decoded Picture Buffer. This reduces the cost of coding uncovered background regions. Distortion over foreground pixels is measured to quantify the performance of skip decision and reference frame selection techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence. In the final part of the thesis, face and shadow region detection is combined with the skip decision algorithm to perform ROI coding for pedestrian surveillance videos. Since person identification requires high quality face images, MB’s containing face image content are encoded with a low Quantization Parameter setting (i.e. high quality). Other regions of the body in the image are considered as RORI (Regions of reduced interest) and are encoded at low quality. The shadow regions are marked as Skip. Techniques that use only facial features to detect faces (e.g. Viola Jones face detector) are not robust in real world scenarios. Hence, we propose to initially detect pedestrians using deformable part models. The face region is determined using the deformed part locations. Detected pedestrians are tracked using an optical flow based tracker combined with a Kalman filter. The tracker improves the accuracy and also avoids the need to run the object detector on already detected pedestrians. Shadow and skin detector scores are computed over super pixels. Bilattice based logic inference is used to combine multiple likelihood scores and classify the super pixels as ROI, RORI or RONI. The coding mode and QP values of the MB’s are determined using the super pixel labels. The proposed techniques provide a further reduction in bitrate of up to 50.2%. Bitrate Reduction Surveillance Video Coding Video Surveillance Gaussian Mixture Model (GMM) Pedestrian Surveillance Cameras Region of Interest (ROI) Video Coding Surveillance Video Cameras Video Coding Encoding Computational Complexity Reduction H.264 Surveillance Coding Gaussian Mixture Model Algorithm Electrical Communication Engineering

Search results