Global ETD Search

11	A Comparison Of Some Robust Regression Techniques Avci, Ezgi 01 September 2009 (has links) (PDF) Robust regression is a commonly required approach in industrial studies like data mining, quality control and improvement, and finance areas. Among the robust regression methods / Least Median Squares, Least Trimmed Squares, Mregression, MM-method, Least Absolute Deviations, Locally Weighted Scatter Plot Smoothing and Multivariate Adaptive Regression Splines are compared under contaminated normal distributions with each other and Ordinary Least Squares with respect to the multiple outlier detection performance measures. In this comparison / a simulation study is performed by changing some of the parameters such as outlier density, outlier locations in the x-axis, sample size and number of independent variables. In the comparison of the methods, multiple outlier detection is carried out with respect to the performance measures detection capability, false alarm rate and improved mean square error and ratio of improved mean square error. As a result of this simulation study, the three most competitive methods are compared on an industrial data set with respect to the coefficient of multiple determination and mean square error.
12	Statistical Analysis of Operational Data for Manufacturing System Performance Improvement Wang, Zhenrui January 2013 (has links) The performance of a manufacturing system relies on its four types of elements: operators, machines, computer system and material handling system. To ensure the performance of these elements, operational data containing various aspects of information are collected for monitoring and analysis. This dissertation focuses on the operator performance evaluation and machine failure prediction. The proposed research work is motivated by the following challenges in analyzing operational data. (i) the complex relationship between the variables, (ii) the implicit information important to failure prediction, and (iii) data with outliers, missing and erroneous measurements. To overcome these challenges, the following research has been conducted. To compare operator performance, a methodology combining regression modeling and multiple comparisons technique is proposed. The regression model quantifies and removes the complex effects of other impacting factors on the operator performance. A robust zero-inflated Poisson (ZIP) model is developed to reduce the impacts of the excessive zeros and outliers in the performance metric, i.e. the number of defects (NoD), on regression analysis. The model residuals are plotted in non-parametric statistical charts for performance comparison. The estimated model coefficients are also used to identify under-performing machines. To detect temporal patterns from operational data sequence, an algorithm is proposed for detecting interval-based asynchronous periodic patterns (APP). The algorithm effectively and efficiently detects pattern through a modified clustering and a convolution-based template matching method. To predict machine failures based on the covariates with erroneous measurements, a new method is proposed for statistical inference of proportional hazard model under a mixture of classical and Berkson errors. The method estimates the model coefficients with an expectation-maximization (EM) algorithm with expectation step achieved by Monte Carlo simulation. The model estimated with the proposed method will improve the accuracy of the inference on machine failure probability. The research work presented in this dissertation provides a package of solutions to improve manufacturing system performance. The effectiveness and efficiency of the proposed methodologies have been demonstrated and justified with both numerical simulations and real-world case studies. Hierarchical clustering Measurement error Multiple comparisons Robust regression Systems & Industrial Engineering Expectation-maximization
13	Revisitando o problema de classificaÃÃo de padrÃes na presenÃa de outliers usando tÃcnicas de regressÃo robusta / Revisiting the problem of pattern classification in the presence of outliers using robust regression techniques Ana Luiza Bessa de Paula Barros 09 August 2013 (has links) Nesta tese, aborda-se o problema de classiﬁcaÃÃo de dados que estÃo contaminados com pa- drÃes atÃpicos. Tais padrÃes, genericamente chamados de outliers, sÃo onipresentes em conjunto de dados multivariados reais, porÃm sua detecÃÃo a priori (i.e antes de treinar um classiﬁcador) Ã uma tarefa de difÃcil realizaÃÃo. Como conseqÃÃncia, uma abordagem reativa, em que se desconﬁa da presenÃa de outliers somente apÃs um classiﬁcador previamente treinado apresen- tar baixo desempenho, Ã a mais comum. VÃrias estratÃgias podem entÃo ser levadas a cabo a ﬁm de melhorar o desempenho do classiﬁcador, dentre elas escolher um classiﬁcador mais poderoso computacionalmente ou promover uma limpeza dos dados, eliminando aqueles pa- drÃes difÃceis de categorizar corretamente. Qualquer que seja a estratÃgia adotada, a presenÃa de outliers sempre irÃ requerer maior atenÃÃo e cuidado durante o projeto de um classiﬁcador de padrÃes. Tendo estas diﬁculdades em mente, nesta tese sÃo revisitados conceitos e tÃcni- cas provenientes da teoria de regressÃo robusta, em particular aqueles relacionados Ã estimaÃÃo M, adaptando-os ao projeto de classiﬁcadores de padrÃes capazes de lidar automaticamente com outliers. Esta adaptaÃÃo leva Ã proposiÃÃo de versÃes robustas de dois classiﬁcadores de padrÃes amplamente utilizados na literatura, a saber, o classiﬁcador linear dos mÃnimos qua- drados (least squares classiﬁer, LSC) e a mÃquina de aprendizado extremo (extreme learning machine, ELM). AtravÃs de uma ampla gama de experimentos computacionais, usando dados sintÃticos e reais, mostra-se que as versÃes robustas dos classiﬁcadores supracitados apresentam desempenho consistentemente superior aos das versÃes originais. / This thesis addresses the problem of data classiﬁcation when they are contaminated with atypical patterns. These patterns, generally called outliers, are omnipresent in real-world multi- variate data sets, but their a priori detection (i.e. before training the classiﬁer) is a difﬁcult task to perform. As a result, the most common approach is the reactive one, in which one suspects of the presence of outliers in the data only after a previously trained classiﬁer has achieved a low performance. Several strategies can then be carried out to improve the performance of the classiﬁer, such as to choose a more computationally powerful classiﬁer and/or to remove the de- tected outliers from data, eliminating those patterns which are difﬁcult to categorize properly. Whatever the strategy adopted, the presence of outliers will always require more attention and care during the design of a pattern classiﬁer. Bearing these difﬁculties in mind, this thesis revi- sits concepts and techniques from the theory of robust regression, in particular those related to M-estimation, adapting them to the design of pattern classiﬁers which are able to automatically handle outliers. This adaptation leads to the proposal of robust versions of two pattern classi- ﬁers widely used in the literature, namely, least squares classiﬁer (LSC) and extreme learning machine (ELM). Through a comprehensive set of computer experiments using synthetic and real-world data, it is shown that the proposed robust classiﬁers consistently outperform their original versions. Pattern classiﬁcation ENGENHARIA ELETRICA
14	Vybrané aspekty robustní regrese a srovnání metod robustní regrese / Selected aspects of robust regression and comparison of robust regression methods Černý, Jindřich January 2006 (has links) This dissertation examines the robust regression methods. The primary purpose of this work is to propose an extension, derivation and summary (including computational algorithm) for Theil-Sen's regression estimates (or in some literature also referred to as Passing-Bablok's regression method) for multi-dimensional space and compare this method to other robust regression methods. The combination of these two objectives is the primary and the original contribution of the dissertation. Based on the available literature it is unknown if anyone has discussed this problem in greater depth and solved it in total. Therefore this work provides a summary overview of the issue and offers a new alternative of this multidimensional, nonparametric, robust regression method. Secondary goals include a clear summary of other robust methods, a summary of findings related to these robust regression methods, robust methods compared with each other placing emphasis on the comparison with the proposed Theil-Sen's regression estimates method and with the least squares method. The summary also includes individual mathematical context and interchangeability of the proposed methods. These secondary objectives are also another benefit of this dissertation in the field of robust regression problems; this is especially important to gain a unified view of the problems of robust regression methods and estimates in general.
15	Robustní lineární regrese / Robust linear regression Rábek, Július January 2021 (has links) Regression analysis is one of the most extensively used statistical tools applied across different fields of science, with linear regression being its most well-known method. How- ever, the traditional procedure to obtain the linear model estimates, the least squares approach, is highly sensitive to even slight departures from the assumed modelling frame- work. This is especially pronounced when atypical values occur in the observed data. This lack of stability of the least squares approach is a serious problem in applications. Thus, the focus of this thesis lies in assessing the available robust alternatives to least squares estimation, which are not so easily affected by any outlying values. First, we introduce the linear regression model theory and derive the least squares method. Then, we char- acterise different types of unusual observations and outline some fundamental robustness measures. Next, we define and examine the robust alternatives to the classical estimation in the linear regression models. Finally, we conduct a comprehensive simulation study comparing the performance of robust methods under different scenarios. 1
16	Bayesian Restricted Likelihood Methods Lewis, John Robert January 2014 (has links) No description available. Statistics restricted likelihood Markov chain Monte Carlo M-estimation robust regression
17	Imputation of Missing Data with Application to Commodity Futures / Imputation av saknad data med tillämpning på råvaruterminer Östlund, Simon January 2016 (has links) In recent years additional requirements have been imposed on ﬁnancial institutions, including Central Counterparty clearing houses (CCPs), as an attempt to assess quantitative measures of their exposure to diﬀerent types of risk. One of these requirements results in a need to perform stress tests to check the resilience in case of a stressed market/crisis. However, ﬁnancial markets develop over time and this leads to a situation where some instruments traded today are not present at the chosen date because they were introduced after the considered historical event. Based on current routines, the main goal of this thesis is to provide a more sophisticated method to impute (ﬁll in) historical missing data as a preparatory work in the context of stress testing. The models considered in this paper include two methods currently regarded as state-of-the-art techniques, based on maximum likelihood estimation (MLE) and multiple imputation (MI), together with a third alternative approach involving copulas. The diﬀerent methods are applied on historical return data of commodity futures contracts from the Nordic energy market. By using conventional error metrics, and out-of-sample log-likelihood, the conclusion is that it is very hard (in general) to distinguish the performance of each method, or draw any conclusion about how good the models are in comparison to each other. Even if the Student’s t-distribution seems (in general) to be a more adequate assumption regarding the data compared to the normal distribution, all the models are showing quite poor performance. However, by analysing the conditional distributions more thoroughly, and evaluating how well each model performs by extracting certain quantile values, the performance of each method is increased signiﬁcantly. By comparing the diﬀerent models (when imputing more extreme quantile values) it can be concluded that all methods produce satisfying results, even if the g-copula and t-copula models seems to be more robust than the respective linear models. / På senare år har ytterligare krav införts för ﬁnansiella institut (t.ex. Clearinghus) i ett försök att fastställa kvantitativa mått på deras exponering mot olika typer av risker. Ett av dessa krav innebär att utföra stresstester för att uppskatta motståndskraften under stressade marknader/kriser. Dock förändras ﬁnansiella marknader över tiden vilket leder till att vissa instrument som handlas idag inte fanns under den dåvarande perioden, eftersom de introducerades vid ett senare tillfälle. Baserat på nuvarande rutiner så är målet med detta arbete att tillhandahålla en mer soﬁstikerad metod för imputation (ifyllnad) av historisk data som ett förberedande arbete i utförandet av stresstester. I denna rapport implementeras två modeller som betraktas som de bäst presterande metoderna idag, baserade på maximum likelihood estimering (MLE) och multiple imputation (MI), samt en tredje alternativ metod som involverar copulas. Modellerna tillämpas på historisk data förterminskontrakt från den nordiska energimarkanden. Genom att använda väl etablerade mätmetoder för att skatta noggrannheten förrespektive modell, är det väldigt svårt (generellt) att särskilja prestandan för varje metod, eller att dra några slutsatser om hur bra varje modell är i jämförelse med varandra. även om Students t-fördelningen verkar (generellt) vara ett mer adekvat antagande rörande datan i jämförelse med normalfördelningen, så visar alla modeller ganska svag prestanda vid en första anblick. Däremot, genom att undersöka de betingade fördelningarna mer noggrant, för att se hur väl varje modell presterar genom att extrahera speciﬁka kvantilvärden, kan varje metod förbättras markant. Genom att jämföra de olika modellerna (vid imputering av mer extrema kvantilvärden) kan slutsatsen dras att alla metoder producerar tillfredställande resultat, även om g-copula och t-copula modellerna verkar vara mer robusta än de motsvarande linjära modellerna. Missing Data Bayesian Statistics Conditional Distribution Robust Regression MCMC Copulas. Saknad Data Bayesiansk Statistik Betingad Sannolikhet Robust Regression MCMC Copulas. Probability Theory and Statistics Sannolikhetsteori och statistik
18	Diagnóstico em regressão L1 / Diagnostic in L1 regression Rodrigues, Kévin Allan Sales 14 March 2019 (has links) Este texto apresenta um método alternativo de regressão que é denominado regressão L1. Este método é robusto com relação a outliers na variável Y enquanto o método tradicional, mínimos quadrados, não oferece robustez a este tipo de outlier. Neste trabalho reanalisaremos os dados sobre imóveis apresentados por Narula e Wellington (1977) à luz da regressão L1. Ilustraremos os principais resultados inferenciais como: interpretação do modelo, construção de intervalos de confiança e testes de hipóteses para os parâmetros, análise de medidas de qualidade do ajuste do modelo e também utilizaremos medidas de diagnóstico para destacar observações influentes. Dentre as medidas de influência utilizaremos a diferença de verossimilhanças e a diferença de verossimilhanças condicional. / This text presents an alternative method of regression that is called L1 regression. This method is robust to outliers in the Y variable while the traditional least squares method does not provide robustness to this type of outlier. In this work we will review the data about houses presented by Narula and Wellington (1977) in the light of the L1 regression. We will illustrate the main inferential results such as: model interpretation, construction of confidence intervals and hypothesis tests for the parameters, analysis of quality measures of model fit and also use diagnostic measures to highlight influential observations. Among the measures of influence we will use the likelihood displacement and the conditional likelihood displacement. Diagnostic methods Influence measures LAD regression Likelihood displacement Likelihood displacement Medidas de Influência Métodos de diagnóstico Regressão L1 Regressão robusta Robust regression
19	Detekce odlehlých a vlivných pozorování v lineární regresi v rámci metody nejmenších čtverců. Kvalitativní porovnání s postupy založenými na robustní regresi. / The methods for detection of the outliers and influential points based on method of least squares in linear regression analysis. The qualitative comparison with the detection methods based on robust regression. Potůčková, Lenka January 2013 (has links) This Thesis deals with the methods for detection of the outliers and influential points based on method of least squares. The first part of the thesis summarizes the teoretical findings of the method of least squares and both methods for detection of the outliers and influential points based on the method of least squares and also based on robust regression. The practical part of this thesis deals with the application of classic methods for detection of the outliers and influential points on three types of datasets (artifical data, data from specialized literature and real data). The results of the application are subject to qualitative comparisson with the results produced by the methods for detection of the outliers and influentials point based on the robust regression.
20	Profile Monitoring with Fixed and Random Effects using Nonparametric and Semiparametric Methods Abdel-Salam, Abdel-Salam Gomaa 20 November 2009 (has links) Profile monitoring is a relatively new approach in quality control best used where the process data follow a profile (or curve) at each time period. The essential idea for profile monitoring is to model the profile via some parametric, nonparametric, and semiparametric methods and then monitor the fitted profiles or the estimated random effects over time to determine if there have been changes in the profiles. The majority of previous studies in profile monitoring focused on the parametric modeling of either linear or nonlinear profiles, with both fixed and random effects, under the assumption of correct model specification. Our work considers those cases where the parametric model for the family of profiles is unknown or at least uncertain. Consequently, we consider monitoring profiles via two techniques, a nonparametric technique and a semiparametric procedure that combines both parametric and nonparametric profile fits, a procedure we refer to as model robust profile monitoring (MRPM). Also, we incorporate a mixed model approach to both the parametric and nonparametric model fits. For the mixed effects models, the MMRPM method is an extension of the MRPM method which incorporates a mixed model approach to both parametric and nonparametric model fits to account for the correlation within profiles and to deal with the collection of profiles as a random sample from a common population. For each case, we formulated two Hotelling's T 2 statistics, one based on the estimated random effects and one based on the fitted values, and obtained the corresponding control limits. In addition,we used two different formulas for the estimated variancecovariance matrix: one based on the pooled sample variance-covariance matrix estimator and a second one based on the estimated variance-covariance matrix based on successive differences. A Monte Carlo study was performed to compare the integrated mean square errors (IMSE) and the probability of signal of the parametric, nonparametric, and semiparametric approaches. Both correlated and uncorrelated errors structure scenarios were evaluated for varying amounts of model misspecification, number of profiles, number of observations per profile, shift location, and in- and out-of-control situations. The semiparametric (MMRPM) method for uncorrelated and correlated scenarios was competitive and, often, clearly superior with the parametric and nonparametric over all levels of misspecification. For a correctly specified model, the IMSE and the simulated probability of signal for the parametric and theMMRPM methods were identical (or nearly so). For the severe modelmisspecification case, the nonparametric andMMRPM methods were identical (or nearly so). For the mild model misspecification case, the MMRPM method was superior to the parametric and nonparametric methods. Therefore, this simulation supports the claim that the MMRPM method is robust to model misspecification. In addition, the MMRPM method performed better for data sets with correlated error structure. Also, the performances of the nonparametric and MMRPM methods improved as the number of observations per profile increases since more observations over the same range of X generally enables more knots to be used by the penalized spline method, resulting in greater flexibility and improved fits in the nonparametric curves and consequently, the semiparametric curves. The parametric, nonparametric and semiparametric approaches were utilized for fitting the relationship between torque produced by an engine and engine speed in the automotive industry. Then, we used a Hotelling's T 2 statistic based on the estimated random effects to conduct Phase I studies to determine the outlying profiles. The parametric, nonparametric and seminonparametric methods showed that the process was stable. Despite the fact that all three methods reach the same conclusion regarding the –in-control– status of each profile, the nonparametric and MMRPM results provide a better description of the actual behavior of each profile. Thus, the nonparametric and MMRPM methods give the user greater ability to properly interpret the true relationship between engine speed and torque for this type of engine and an increased likelihood of detecting unusual engines in future production. Finally, we conclude that the nonparametric and semiparametric approaches performed better than the parametric approach when the user's model is misspecified. The case study demonstrates that, the proposed nonparametric and semiparametric methods are shown to be more efficient, flexible and robust to model misspecification for Phase I profile monitoring in a practical application. Thus, our methods are robust to the common problem of model misspecification. We also found that both the nonparametric and the semiparametric methods result in charts with good abilities to detect changes in Phase I data, and in charts with easily calculated control limits. The proposed methods provide greater flexibility and efficiency than current parametric methods used in profile monitoring for Phase I that rely on correct model specification, an unrealistic situation in many practical problems in industrial applications. / Ph. D. P-splines Nonparametric Semiparametric Model Robust Regression Model Misspecification T2 Control Chart. Model Robust Profile Monitoring Parametric

Search results