101 |
Discovery of Outlier Points and Dense Regions in Large Data-Sets Using Spark EnvironmentNadella, Pravallika 04 October 2021 (has links)
No description available.
|
102 |
Understanding Noise and Structure behind Metric SpacesWang, Dingkang 20 October 2021 (has links)
No description available.
|
103 |
Robust Models for Accommodating Outliers in Random Effects Meta Analysis: A Simulation Study and Empirical StudyStacey, Melanie January 2016 (has links)
In traditional meta-analysis, a random-effects model is used to deal with heterogeneity and the random-effect is assumed to be normally distributed. However, this can be problematic in the presence of outliers. One solution involves using a heavy tailed distribution for the random-effect to more adequately model the excess variation due to the outliers. Failure to consider an alternative approach to the standard in the presence of unusual or outlying points can lead to inaccurate inference. A heavy tailed distribution is favoured because it has the ability to down-weight outlying studies appropriately, therefore the removal of a study does not need to be considered.
In this thesis, the performance of the t-distribution and a finite mixture model are assessed as alternatives to the normal distribution through a comprehensive simulation study. The parameters varied are the average mean of the non-outlier studies, the number of studies, the proportion of outliers, the heterogeneity and the outlier shift distance from the average mean. The performance of the distributions is measured using bias, mean squared error, coverage probability, coverage width, Type I error and power. The methods are also compared through an empirical study of meta-analyses from The Cochrane Library (2008).
The simulation showed that the performance of the alternative distributions is better than the normal distribution for a number of scenarios, particularly for extreme outliers and high heterogeneity. Generally, the mixture model performed quite well.
The empirical study reveals that both alternative distributions are able to reduce the influence of the outlying studies on the overall mean estimate and thus produce more conservative p-values than the normal distribution.
It is recommended that a practitioner consider the use of an alternative random-effects distribution in the presence of outliers because they are more likely to provide robust results. / Thesis / Master of Science (MSc)
|
104 |
Profile Monitoring for Mixed Model DataJensen, Willis Aaron 26 April 2006 (has links)
The initial portion of this research focuses on appropriate parameter estimators within a general context of multivariate quality control. The goal of Phase I analysis of multivariate quality control data is to identify multivariate outliers and step changes so that the estimated control limits are sufficiently accurate for Phase II monitoring. High breakdown estimation methods based on the minimum volume ellipsoid (MVE) or the minimum covariance determinant (MCD) are well suited to detecting multivariate outliers in data. Because of the inherent difficulties in computation many algorithms have been proposed to obtain them. We consider the subsampling algorithm to obtain the MVE estimators and the FAST-MCD algorithm to obtain the MCD estimators. Previous studies have not clearly determined which of these two estimation methods is best for control chart applications. The comprehensive simulation study here gives guidance for when to use which estimator. Control limits are provided. High breakdown estimation methods such as MCD and MVE can be applied to a wide variety of multivariate quality control data.
The final, lengthier portion of this research considers profile monitoring. Profile monitoring is a relatively new technique in quality control used when the product or process quality is best represented by a profile (or a curve) at each time period. The essential idea is often to model the profile via some parametric method and then monitor the estimated parameters over time to determine if there have been changes in the profiles. Because the estimated parameters may be correlated, it is convenient to monitor them using a multivariate control method such as the T-squared statistic. Previous modeling methods have not incorporated the correlation structure within the profiles. We propose the use of mixed models (both linear and nonlinear) to monitor linear and nonlinear profiles in order to account for the correlation structure within a profile. We consider various data scenarios and show using simulation when the mixed model approach is preferable to an approach that ignores the correlation structure. Our focus is on Phase I control chart applications. / Ph. D.
|
105 |
An investigation into price-quality tradeoffs: the effects of order of presentation and presentation of outlying alternativesDeMoranville, Carol W. 06 June 2008 (has links)
In virtually every buying decision, consumers must make tradeoffs among levels of product attributes. One of the most frequent kinds of tradeoffs is that between price and quality. This research investigates the effects of two controllable variables on price quality trade offs; the price/quality order in which alternatives are presented and whether alternatives outside the range of buyer expectations (outliers) are presented. The level of the final reference point is suggested to mediate order and outliers effects on the dependent variables of evaluation, search length, price-quality choice, and satisfaction. In addition, the effects of order and outliers on buyer-seller relationship quality are examined. A measurement problem precluded determination of final reference point as a mediating variable, but the other effects were as predicted by final reference points. Presenting alternatives in a descending order of price and quality resulted in less search than an ascending order. Primacy effects were evident as the descending order also resulted in choices of higher price and quality than both ascending and random orders. Moderate outliers also resulted in higher price-quality choices than either no outliers or extreme outliers, but only when presented in a descending order. There were no significant effects on evaluation of alternatives. Perceptions of buyer-seller relationship were better when alternatives were presented in an ascending order as compared to a descending order, but were not affected by presenting outliers. Buyer’s satisfaction was lower when an extreme outlier was included in choice set presented in an ascending order. / Ph. D.
|
106 |
Hydroxypropylmethylcellulose: A New Matrix for Solid-Surface Room-Temperature PhosphorimetryHamner, Vincent N. 05 November 1999 (has links)
This thesis reports an investigation of hydroxypropylmethylcellulose (HPMC) as a new solid-surface room-temperature phosphorescence (SSRTP) sample matrix. The high background phosphorescence originating from filter paper substrates can interfere with the detection and quantitation of trace-level analytes. High-purity grades of HPMC were investigated as SSRTP substrates in an attempt to overcome this limitation. When compared directly to filter paper, HPMC allows the spectroscopist to achieve greater sensitivity, lower limits of detection (LOD), and lower limits of quantitation (LOQ) for certain phosphor/heavy-atom combinations since SSRTP signal intensities are stronger. For example, the determination of the analytical figures of merit for a naphthalene/sodium iodide/HPMC system resulted in a calibration sensitivity of 2.79, LOD of 4 ppm (3 ng), and LOQ of 14 ppm (11 ng). Corresponding investigations of a naphthalene/sodium iodide/filter paper system produced a calibration sensitivity of 0.326, LOD of 33 ppm (26 ng), and LOQ of 109 ppm (86 ng). Extended purging with dry-nitrogen gas yields improved sensitivities, lower LOD's, and lower LOQ's in HPMC matrices when LOD and LOQ are calculated according to the IUPAC guidelines.To test the universality of HPMC, qualitative SSRTP spectra were obtained for a wide variety of probe phosphors offering different molecular sizes, shapes, and chemical functionalities. Suitable spectra were obtained for the following model polycyclic aromatic hydrocarbons (PAHs): naphthalene, p-aminobenzoic acid, acenaphthene, phenanthrene, 2-naphthoic acid, 2-naphthol, salicylic acid, and triphenylene.Filter paper and HPMC substrates are inherently anisotropic, non-heterogeneous media. Since this deficiency cannot be addressed experimentally, a robust statistical method is examined for the detection of questionable SSRTP data points and the deletion of outlying observations. If discordant observations are discarded, relative standard deviations are typically reduced to less than 10% for most SSRTP data sets. Robust techniques for outlier identification are superior to traditional methods since they operate at a high level of efficiency and are immune to masking effects.The process of selecting a suitable sample support material often involves considerable trial-and-error on the part of the analyst. A mathematical model based on Hansen's cohesion parameter theory is developed to predict favorable phosphor-substrate attraction and interactions. The results of investigations using naphthalene as a probe phosphor and sodium iodide as an external heavy-atom enhancer support the cohesion parameter model.This document includes a thorough description of the fundamental principles of phosphorimetry and provides a detailed analysis of the theoretical and practical concerns associated with performing SSRTP. In order to better understand the properties of both filter paper and HPMC, a chapter is devoted to the discussion of the cellulose biopolymer. Experimental results and interpretations are presented and suggestions for future investigations are provided. Together, these results provide a framework that will support additional advancements in the field of solid-surface room-temperature phosphorescence spectroscopy. / Ph. D.
|
107 |
Enhancing Network-Level Pavement Macrotexture AssessmentBongioanni, Vincent Italo 30 April 2019 (has links)
Pavement macrotexture has been shown to influence a range of safety and comfort issues including wet weather friction, splash and spray, ambient and in-vehicle noise, tire wear, and rolling resistance. While devices and general guidance exist to measure macrotexture, the wide-scale collection and use of macrotexture is neither mandated nor is it typically employed in the United States. This work seeks to improve upon the methods used to calibrate, collect, pre-process, and distill macrotexture data into useful information that can be utilized by pavement managers. This is accomplished by 1. developing a methodology to evaluate and compare candidate data collection devices; 2. plans and procedures to evaluate the accuracy of high-speed network data collection devices with reference surfaces and measurements; 3. the development of a method to remove erroneous data from emerging 3-D macrotexture sensors; 4. development of a model to describe the change in macrotexture as a function of traffic; 5.finally, distillation of the final collected pavement surface profiles into parameters for the prediction of important pavement surface properties aforementioned. Various high-speed macrotexture measurement devices were shown to have good repeatability (between 0.06 to 0.09mm MPD) and interchangeability of single-spot laser dfevices was demonstrated via a limits of agreement analysis. The operational factors of speed and acceleration were shown to affect the resulting MPD of several devices and guidelines are given for vehicle speed and sensor exposure settings. Devices with single spot and line lasers were shown to reproduce reference waveforms on manufactured surfaces within predefined tolerances. A model was developed that predicts future macrotexture levels (as measured by RMS) for pavements prone to bleeding due to rich asphalt content. Finally, several previously published macrotexture parameters along with a suite of novel parameters were evaluated for their effectiveness in the prediction of wet weather friction and certain types of road noise. Many of the parameters evaluated outperformed the current metrics of MPD and RMS. / Doctor of Philosophy
|
108 |
Caracterización de medidas de regularidad en señales biomédicas. Robustez a outliersMolina Picó, Antonio 03 September 2014 (has links)
Los sistemas fisiológicos generan señales eléctricas durante su funcionamiento. Estas
señales pueden ser registradas y representadas, constituyendo un elemento
fundamental de ayuda al diagnóstico en la práctica clínica actual. Sin embargo,
la inspección visual no permite la extracción completa de la información contenida
en estas señales. Entre las técnicas de procesamiento automático, destacan los
métodos no lineales, específicamente aquellos relacionados con la estimación de la
regularidad de la señal subyacente. Estos métodos están ofreciendo en los ´últimos
años resultados muy significativos en este ´ámbito. Sin embargo, son muy sensibles
a las interferencias en las señales, ocurriendo una degradación significativa de su
capacidad diagnostica si las señales biomédicas están contaminadas. Uno de los
elementos que se presenta con cierta frecuencia en los registros fisiológicos y que
contribuye a esta degradación de prestaciones en estimadores no lineales, son los
impulsos de cortad duración, conocidos en este contexto como spikes.
En este trabajo se pretende abordar la problemática asociada a la presencia de
spikes en bioseñales, caracterizando su influencia en una serie de medidas concretas,
para que la posible degradación pueda ser anticipada y las contramedidas
pertinentes aplicadas. En concreto, las medidas de regularidad caracterizadas son:
Approximate Entropy (ApEn), Sample Entropy (SampEn), Lempel Ziv Complexity
(LZC) y Detrended Fluctuation Analysis (DFA). Todos estos métodos han
ofrecido resultados satisfactorios en multitud de estudios previos en el procesado
de señales biomédicas. La caracterización se lleva a cabo mediante un exhaustivo
estudio experimental en el cual se aplican spikes controlados a diferentes registros
fisiológicos, y se analiza cuantitativa y cualitativamente la influencia de dichos
spikes en la estimación resultante.
Los resultados demuestran que el nivel de interferencia, así como los parámetros de
las medidas de regularidad, afectan de forma muy variada. En general, LZC es la
medida más robusta del conjunto caracterizado frente a spikes, mientras que DFA
es la más vulnerable. Sin embargo, la capacidad de discernir entre clases permanece
en muchos casos, a pesar de los cambios producidos en los valores absolutos de
entropía. / Molina Picó, A. (2014). Caracterización de medidas de regularidad en señales biomédicas. Robustez a outliers [Tesis doctoral]. Editorial Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/39346
|
109 |
Outliers and robust response surface designsO'Gorman, Mary Ann January 1984 (has links)
A commonly occurring problem in response surface methodology is that of inconsistencies in the response variable. These inconsistencies, or maverick observations, are referred to here as outliers. Many models exist for describing these outliers. Two of these models, the mean shift and the variance inflation outlier models, are employed in this research.
Several criteria are developed for determining when the outlying observation is detrimental to the analysis. These criteria all lead to the same condition which is used to develop statistical tests of the null hypothesis that the outlier is not detrimental to the analysis. These results are extended to the multiple outlier case for both models.
The robustness of response surface designs is also investigated. Robustness to outliers, missing data and errors in control are examined for first order models. The orthogonal designs with large second moments, such as the 2ᵏ factorial designs, are optimal in all three cases.
In the second order case, robustness to outliers and to missing data are examined. Optimal design parameters are obtained by computer for the central composite, Box-Behnken, hybrid, small composite and equiradial designs. Similar results are seen for both robustness to outliers and to missing data. The central composite turns out to be the optimal design type and of the two economical design types the small composite is preferred to the hybrid. / Ph. D.
|
110 |
Contribuições à análise de outliers em modelos de equações estruturais / Contributions to the analysis of outliers in structural equation modelsBulhões, Rodrigo de Souza 10 May 2013 (has links)
O Modelo de Equações Estruturais (MEE) é habitualmente ajustado para realizar uma análise confirmatória sobre as conjecturas de um pesquisador acerca do relacionamento entre as variáveis observadas e latentes de algum estudo. Na prática, a maneira mais recorrente de avaliar a qualidade das estimativas de um MEE é a partir de medidas que buscam mensurar o quanto a usual matriz de covariâncias clássicas ou ordinárias se distancia da matriz de covariâncias do modelo ajustado, ou a magnitude do afastamento entre as funções de discrepância do modelo hipotético e do modelo saturado. Entretanto, elas podem não captar problemas no ajuste quando há muitos parâmetros a estimar ou bastantes observações. A fim de detectar irregularidades no ajustamento resultantes do impacto provocado pela presença de outliers no conjunto de dados, este trabalho contemplou alguns indicadores conhecidos na literatura, como também considerou alterações no Índice da Qualidade do Ajuste (ou GFI, de Goodness-of-Fit Index) e no Índice Corrigido da Qualidade do Ajuste (ou AGFI, de Ajusted Goodness-of-Fit Index), ambos nas expressões para estimação de parâmetros pelo método de Máxima Verossimilhança, que consistiram em substituir a tradicional matriz de covariâncias pelas matrizes de covariâncias computadas com os seguintes estimadores: Elipsoide de Volume Mínimo, Covariância de Determinante Mínimo, S, MM e Gnanadesikan-Kettenring Ortogonalizado (GKO). Através de estudos de simulação sobre perturbações de desvio de simetria e excesso de curtose, em baixa e alta frações de contaminação, em diferentes tamanhos de amostra e quantidades de variáveis observadas afetadas, foi possível constatar que as propostas de modificação do GFI e do AGFI adaptadas pelo estimador GKO foram as únicas que conseguiram ser informativas em todas essas situações, devendo-se escolher a primeira ou a segunda respectivamente quando a quantidade de parâmetros a serem estimados é baixa ou elevada. / The Structural Equation Model (SEM) is usually set to perform a confirmatory analysis on the assumptions of a researcher about the relationship between the observed variables and the latent variables of such a study. In practice, the most iterant way of evaluating the quality of the estimates of a SEM comes either from procedures of measuring how distant the usual classic or ordinary covariance matrix is from the covariance matrix of the adjusted model, or from the magnitude of the hiatus in discrepancy functions of both the hypothetical model and the saturated model. Nevertheless, they may fail to capture problems in the adjustment in the occurrence of either several parameters to estimate or several observations. This study included indicators known in the literature in order to detect irregularities in the adjustment resulting from the impact caused by the presence of outliers in the data set. This study has also considered changes in both the Goodness-of-Fit Index (GFI) and the Adjusted Goodness-of-Fit Index (AGFI) in the expressions for parameter estimation by Maximum Likelihood method, which consisted in replacing the traditional covariance matrix by the robust covariance matrices computed through the following estimators: Minimum Volume Ellipsoid, Minimum Covariance Determinant, S, MM and Orthogonalized Gnanadesikan-Kettenring (OGK). Through simulation studies on disturbances of both symmetry deviations and excess kurtosis in both low and high fractions of contamination in different sample sizes and quantities of affected observed variables it has become clear that the proposals of modification of both the GFI and the AGFI adapted by the OGK estimator were the only ones able to be informative in all these situations. It must be considered that GFI or AGFI must be used when the number of parameters to be estimated is either low or high, respectively.
|
Page generated in 0.0624 seconds