Global ETD Search

1	Robust mixture regression models using t-distribution Wei, Yan January 1900 (has links) Master of Science / Department of Statistics / Weixin Yao / In this report, we propose a robust mixture of regression based on t-distribution by extending the mixture of t-distributions proposed by Peel and McLachlan (2000) to the regression setting. This new mixture of regression model is robust to outliers in y direction but not robust to the outliers with high leverage points. In order to combat this, we also propose a modified version of the proposed method, which fits the mixture of regression based on t-distribution to the data after adaptively trimming the high leverage points. We further propose to adaptively choose the degree of freedom for the t-distribution using profile likelihood. The proposed robust mixture regression estimate has high efficiency due to the adaptive choice of degree of freedom. We demonstrate the effectiveness of the proposed new method and compare it with some of the existing methods through simulation study. EM algorithm Mixture regression models Outliers Robust regression T-distribution Statistics (0463)
2	Robust mixture regression model fitting by Laplace distribution Xing, Yanru January 1900 (has links) Master of Science / Department of Statistics / Weixing Song / A robust estimation procedure for mixture linear regression models is proposed in this report by assuming the error terms follow a Laplace distribution. EM algorithm is imple- mented to conduct the estimation procedure of missing information based on the fact that the Laplace distribution is a scale mixture of normal and a latent distribution. Finite sample performance of the proposed algorithm is evaluated by some extensive simulation studies, together with the comparisons made with other existing procedures in this literature. A sensitivity study is also conducted based on a real data example to illustrate the application of the proposed method. EM algorithm Laplace distribution Least absolute deviation Mixture regression model Statistics (0463)
3	Mitigating predictive uncertainty in hydroclimatic forecasts: impact of uncertain inputs and model structural form Chowdhury, Shahadat Hossain, Civil & Environmental Engineering, Faculty of Engineering, UNSW January 2009 (has links) Hydrologic and climate models predict variables through a simplification of the underlying complex natural processes. Model development involves minimising predictive uncertainty. Predictive uncertainty arises from three broad sources which are measurement error in observed responses, uncertainty of input variables and model structural error. This thesis introduces ways to improve predictive accuracy of hydroclimatic models by considering input and structural uncertainties. The specific methods developed to reduce the uncertainty because of erroneous inputs and model structural errors are outlined below. The uncertainty in hydrological model inputs, if ignored, introduces systematic biases in the parameters estimated. This thesis presents a method, known as simulation extrapolation (SIMEX), to ascertain the extent of parameter bias. SIMEX starts by generating a series of alternate inputs by artificially adding white noise in increasing multiples of the known input error variance. The resulting alternate parameter sets allow formulation of an empirical relationship between their values and the level of noise present. SIMEX is based on the theory that the trend in alternate parameters can be extrapolated back to the notional error free zone. The case study relates to erroneous sea surface temperature anomaly (SSTA) records used as input variables of a linear model to predict the Southern Oscillation Index (SOI). SIMEX achieves a reduction in residual errors from the SOI prediction. Besides, a hydrologic application of SIMEX is demonstrated by a synthetic simulation within a three-parameter conceptual rainfall runoff model. This thesis next advocates reductions of structural uncertainty of any single model by combining multiple alternative model responses. Current approaches for combining hydroclimatic forecasts are generally limited to using combination weights that remain static over time. This research develops a methodology for combining forecasts from multiple models in a dynamic setting as an improvement of over static weight combination. The model responses are mixed on a pair wise basis using mixing weights that vary in time reflecting the persistence of individual model skills. The concept is referred here as the pair wise dynamic weight combination. Two approaches for forecasting the dynamic weights are developed. The first of the two approaches uses a mixture of two basis distributions which are three category ordered logistic regression model and a generalised linear autoregressive model. The second approach uses a modified nearest neighbour approach to forecast the future weights. These alternatives are used to first combine a univariate response forecast, the NINO3.4 SSTA index. This is followed by a similar combination, but for the entire global gridded SSTA forecast field. Results from these applications show significant improvements being achieved due to the dynamic model combination approach. The last application demonstrating the dynamic combination logic, uses the dynamically combined multivariate SSTA forecast field as the basis of developing multi-site flow forecasts in the Namoi River catchment in eastern Australia. To further reduce structural uncertainty in the flow forecasts, three forecast models are formulated and the dynamic combination approach applied again. The study demonstrates that improved SSTA forecast (due to dynamic combination) in turn improves all three flow forecasts, while the dynamic combination of the three flow forecasts results in further improvements. Dynamic weight Uncertainty Model combination Input error SIMEX Flow forecast Sea surface temperature NINO3.4 Mixture regression
4	Mitigating predictive uncertainty in hydroclimatic forecasts: impact of uncertain inputs and model structural form Chowdhury, Shahadat Hossain, Civil & Environmental Engineering, Faculty of Engineering, UNSW January 2009 (has links) Hydrologic and climate models predict variables through a simplification of the underlying complex natural processes. Model development involves minimising predictive uncertainty. Predictive uncertainty arises from three broad sources which are measurement error in observed responses, uncertainty of input variables and model structural error. This thesis introduces ways to improve predictive accuracy of hydroclimatic models by considering input and structural uncertainties. The specific methods developed to reduce the uncertainty because of erroneous inputs and model structural errors are outlined below. The uncertainty in hydrological model inputs, if ignored, introduces systematic biases in the parameters estimated. This thesis presents a method, known as simulation extrapolation (SIMEX), to ascertain the extent of parameter bias. SIMEX starts by generating a series of alternate inputs by artificially adding white noise in increasing multiples of the known input error variance. The resulting alternate parameter sets allow formulation of an empirical relationship between their values and the level of noise present. SIMEX is based on the theory that the trend in alternate parameters can be extrapolated back to the notional error free zone. The case study relates to erroneous sea surface temperature anomaly (SSTA) records used as input variables of a linear model to predict the Southern Oscillation Index (SOI). SIMEX achieves a reduction in residual errors from the SOI prediction. Besides, a hydrologic application of SIMEX is demonstrated by a synthetic simulation within a three-parameter conceptual rainfall runoff model. This thesis next advocates reductions of structural uncertainty of any single model by combining multiple alternative model responses. Current approaches for combining hydroclimatic forecasts are generally limited to using combination weights that remain static over time. This research develops a methodology for combining forecasts from multiple models in a dynamic setting as an improvement of over static weight combination. The model responses are mixed on a pair wise basis using mixing weights that vary in time reflecting the persistence of individual model skills. The concept is referred here as the pair wise dynamic weight combination. Two approaches for forecasting the dynamic weights are developed. The first of the two approaches uses a mixture of two basis distributions which are three category ordered logistic regression model and a generalised linear autoregressive model. The second approach uses a modified nearest neighbour approach to forecast the future weights. These alternatives are used to first combine a univariate response forecast, the NINO3.4 SSTA index. This is followed by a similar combination, but for the entire global gridded SSTA forecast field. Results from these applications show significant improvements being achieved due to the dynamic model combination approach. The last application demonstrating the dynamic combination logic, uses the dynamically combined multivariate SSTA forecast field as the basis of developing multi-site flow forecasts in the Namoi River catchment in eastern Australia. To further reduce structural uncertainty in the flow forecasts, three forecast models are formulated and the dynamic combination approach applied again. The study demonstrates that improved SSTA forecast (due to dynamic combination) in turn improves all three flow forecasts, while the dynamic combination of the three flow forecasts results in further improvements. Dynamic weight Uncertainty Model combination Input error SIMEX Flow forecast Sea surface temperature NINO3.4 Mixture regression
5	High-Dimensional Functional Graphs and Inference for Unknown Heterogeneous Populations Chen, Han 21 November 2024 (has links) In this dissertation, we develop innovative methods for analyzing high-dimensional, heterogeneous functional data, focusing specifically on uncovering hidden patterns and network structures within such complex data. We utilize functional graphical models (FGMs) to explore the conditional dependence structure among random elements. We mainly focus on the following three research projects. The first project combines the strengths of FGMs with finite mixture of regression models (FMR) to overcome the challenges of estimating conditional dependence structures from heterogeneous functional data. This novel approach facilitates the discovery of latent patterns, proving particularly advantageous for analyzing complex datasets, such as brain imaging studies of autism spectrum disorder (ASD). Through numerical analysis of both simulated data and real-world ASD brain imaging, we demonstrate the effectiveness of our methodology in uncovering complex dependencies that traditional methods may miss due to their homogeneous data assumptions. Secondly, we address the challenge of variable selection within FMR in high-dimensional settings by proposing a joint variable selection technique. This technique employs a penalized expectation-maximization (EM) algorithm that leverages shared structures across regression components, thereby enhancing the efficiency of identifying relevant predictors and improving the predictive ability. We further expand this concept to mixtures of functional regressions, employing a group lasso penalty for variable selection in heterogeneous functional data. Lastly, we recognize the limitations of existing methods in testing the equality of multiple functional graphs and develop a novel, permutation-based testing procedure. This method provides a robust, distribution-free approach to comparing network structures across different functional variables, as illustrated through simulation studies and functional magnetic resonance imaging (fMRI) analysis for ASD. Hence, these research works provide a comprehensive framework for functional data analysis, significantly advancing the estimation of network structures, functional variable selection, and testing of functional graph equality. This methodology holds great promise for enhancing our understanding of heterogeneous functional data and its practical applications. / Doctor of Philosophy / This study introduces innovative techniques for analyzing complex, high-dimensional functional data, such as functional magnetic resonance imaging (fMRI) data from the brain. Our goal is to reveal underlying patterns and network connections, particularly in the context of autism spectrum disorder (ASD). In functional data, we treat each signal curve from various locations as a single data point. These datasets are characterized by high dimensionality, with the number of model parameters exceeding the sample size. We employ functional graphical models (FGMs) to investigate the conditional dependencies among data elements. Our approach combines FGMs with finite mixture of regression models (FMR), allowing us to uncover hidden patterns that traditional methods assuming homogeneity might miss. Additionally, we introduce a new method for selecting relevant variables in high-dimensional regression contexts. This method enhances prediction accuracy by utilizing shared information among regression components. Furthermore, we develop a robust testing framework to facilitate the comparison of network structures between groups without relying on distribution assumptions. This enables precise evaluations of functional graphs. Hence, our research works contribute to a deeper understanding of complex, diverse functional data, paving the way for novel insights across various fields. Functional Data Graphical Model Joint Variable Selection Mixture Regression Permutation Test
6	Modèles de mélange pour la régression en grande dimension, application aux données fonctionnelles / High-dimensional mixture regression models, application to functional data Devijver, Emilie 02 July 2015 (has links) Les modèles de mélange pour la régression sont utilisés pour modéliser la relation entre la réponse et les prédicteurs, pour des données issues de différentes sous-populations. Dans cette thèse, on étudie des prédicteurs de grande dimension et une réponse de grande dimension. Tout d’abord, on obtient une inégalité oracle ℓ1 satisfaite par l’estimateur du Lasso. On s’intéresse à cet estimateur pour ses propriétés de régularisation ℓ1. On propose aussi deux procédures pour pallier ce problème de classification en grande dimension. La première procédure utilise l’estimateur du maximum de vraisemblance pour estimer la densité conditionnelle inconnue, en se restreignant aux variables actives sélectionnées par un estimateur de type Lasso. La seconde procédure considère la sélection de variables et la réduction de rang pour diminuer la dimension. Pour chaque procédure, on obtient une inégalité oracle, qui explicite la pénalité nécessaire pour sélectionner un modèle proche de l’oracle. On étend ces procédures au cas des données fonctionnelles, où les prédicteurs et la réponse peuvent être des fonctions. Dans ce but, on utilise une approche par ondelettes. Pour chaque procédure, on fournit des algorithmes, et on applique et évalue nos méthodes sur des simulations et des données réelles. En particulier, on illustre la première méthode par des données de consommation électrique. / Finite mixture regression models are useful for modeling the relationship between a response and predictors, arising from different subpopulations. In this thesis, we focus on high-dimensional predictors and a high-dimensional response. First of all, we provide an ℓ1-oracle inequality satisfied by the Lasso estimator. We focus on this estimator for its ℓ1-regularization properties rather than for the variable selection procedure. We also propose two procedures to deal with this issue. The first procedure leads to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted to the relevant variables selected by an ℓ1-penalized maximum likelihood estimator. The second procedure considers jointly predictor selection and rank reduction for obtaining lower-dimensional approximations of parameters matrices. For each procedure, we get an oracle inequality, which derives the penalty shape of the criterion, depending on the complexity of the random model collection. We extend these procedures to the functional case, where predictors and responses are functions. For this purpose, we use a wavelet-based approach. For each situation, we provide algorithms, apply and evaluate our methods both on simulations and real datasets. In particular, we illustrate the first procedure on an electricity load consumption dataset. Modèles de mélange en régression Classification non supervisée Grande dimension Sélection de variables Sélection de modèles Inégalité oracle Données fonctionnelles Consommation électrique Ondelettes Mixture regression models Clustering High dimension Variable selection Model selection Oracle inequality Functional data Electricity consumption Wavelets
7	Automatic Speech Quality Assessment in Unified Communication : A Case Study / Automatisk utvärdering av samtalskvalitet inom integrerad kommunikation : en fallstudie Larsson Alm, Kevin January 2019 (has links) Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world. speech voice communication qoe quality of experience unified communication uc speech quality assessment speech quality voice calls gaussian mixture model gmm gaussian mixture regression gmr mel frequency cepstrum coefficients mfcc human feature cepstrum coefficients hfcc gfcc Software Engineering Programvaruteknik

1

Page generated in 0.0923 seconds