Spelling suggestions: "subject:"outliers (statistics)"" "subject:"outlier's (statistics)""
21 |
Detection of outliers in failure dataGallup, Donald Robert January 2011 (has links)
Typescript (photocopy). / Digitized by Kansas Correctional Industries
|
22 |
Analysis of outliers using graphical and quasi-Bayesian methods馮榮錦, Fung, Wing-kam, Tony. January 1987 (has links)
published_or_final_version / Statistics / Doctoral / Doctor of Philosophy
|
23 |
Outlier detection by network flowLiu, Ying. January 2007 (has links) (PDF)
Thesis (Ph. D.)--University of Alabama at Birmingham, 2007. / Additional advisors: Elliot J. Lefkowitz, Kevin D. Reilly, Robert Thacker, Chengcui Zhang. Description based on contents viewed Feb. 7, 2008; title from title screen. Includes bibliographical references (p. 125-132).
|
24 |
Advances in statistical inference and outlier related issues.Childs, Aaron Michael. Balakrishnan N. Unknown Date (has links)
Thesis (Ph.D.)--McMaster University (Canada), 1996. / Source: Dissertation Abstracts International, Volume: 57-10, Section: B, page: 6347. Adviser: N. Balakrishnan.
|
25 |
Robust procedures for mediation analysisZu, Jiyun. January 2009 (has links)
Thesis (Ph. D.)--University of Notre Dame, 2009. / Thesis directed by Ke-Hai Yuan for the Department of Psychology. "July 2009." Includes bibliographical references (leaves 151-155).
|
26 |
Robust second-order least squares estimation for linear regression modelsChen, Xin 10 November 2010 (has links)
The second-order least-squares estimator (SLSE), which was proposed by Wang (2003), is asymptotically more efficient than the least-squares estimator (LSE) if the third moment of the error distribution is nonzero. However, it is not robust against outliers. In this paper. we propose two robust second-order least-squares estimators (RSLSE) for linear regression models. RSLSE-I and RSLSE-II, where RSLSE-I is robust against X-outliers and RSLSE-II is robust. against X-outliers and Y-outliers. The basic idea is to choose proper weight matrices, which give a zero weight to an outlier. The RSLSEs are asymptotically normally distributed and are highly efficient with high breakdown point.. Moreover, we compare the RSLSEs with the LSE, the SLSE and the robust MM-estimator through simulation studies and real data examples. The results show that they perform very well and are competitive to other robust regression estimators.
|
27 |
Outliers and robust response surface designsO'Gorman, Mary Ann January 1984 (has links)
A commonly occurring problem in response surface methodology is that of inconsistencies in the response variable. These inconsistencies, or maverick observations, are referred to here as outliers. Many models exist for describing these outliers. Two of these models, the mean shift and the variance inflation outlier models, are employed in this research.
Several criteria are developed for determining when the outlying observation is detrimental to the analysis. These criteria all lead to the same condition which is used to develop statistical tests of the null hypothesis that the outlier is not detrimental to the analysis. These results are extended to the multiple outlier case for both models.
The robustness of response surface designs is also investigated. Robustness to outliers, missing data and errors in control are examined for first order models. The orthogonal designs with large second moments, such as the 2ᵏ factorial designs, are optimal in all three cases.
In the second order case, robustness to outliers and to missing data are examined. Optimal design parameters are obtained by computer for the central composite, Box-Behnken, hybrid, small composite and equiradial designs. Similar results are seen for both robustness to outliers and to missing data. The central composite turns out to be the optimal design type and of the two economical design types the small composite is preferred to the hybrid. / Ph. D.
|
28 |
Diagnostico de influencia em modelos de volatilidade estocastica / Influence diagnostics in stochastic volatility modelsMartim, Simoni Fernanda 14 August 2018 (has links)
Orientadores: Mauricio Enrique Zevallos Herencia, Luiz Koodi Hotta / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-14T12:07:35Z (GMT). No. of bitstreams: 1
Martim_SimoniFernanda_M.pdf: 2441806 bytes, checksum: 4d34450ac590270c90e7eb66a293b51b (MD5)
Previous issue date: 2009 / Resumo: O diagnóstico de modelos é uma etapa fundamental para avaliar a qualidade do ajuste dos modelos. Nesse sentido, uma das ferramentas de diagnóstico mais importantes é a análise de influência. Peña (2005) introduziu uma forma de analisar a influência em modelos de regressão, a qual avalia como cada ponto é influenciado pelos outros na amostra. Essa estratégia de diagnóstico foi adaptada por Hotta e Motta (2007) na análise de influência dos modelos de volatilidade estocástica univariados. Nesta dissertação, é realizado um estudo de diagnóstico de influência para modelos de volatilidade estocástica univariados assimétricos, assim como para modelos de volatilidade estocástica multivariados. As metodologias propostas são ilustradas através da análise de dados simulados e séries reais de retornos financeiros. / Abstract: Model diagnostics is a key step to assess the quality of fitted models. In this sense, one of the most important tools is the analysis of influence. Peña (2005) introduced a way of assessing influence in linear regression models, which evaluates how each point is influenced by the others in the sample. This diagnostic strategy was adapted by Hotta and Motta (2007) on the influence analysis of univariate stochastic volatility models. In this dissertation, it is performed a study of influence diagnostics of asymmetric univariate stochastic volatility models as well as multivariate stochastic volatility models. The proposed methodologies are illustrated through the analysis of simulated data and financial time series returns. / Mestrado / Series Temporais Financeiras / Mestra em Estatística
|
29 |
Multiple outlier detection and cluster analysis of multivariate normal dataRobson, Geoffrey 12 1900 (has links)
Thesis (MscEng)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: Outliers may be defined as observations that are sufficiently aberrant to arouse the
suspicion of the analyst as to their origin. They could be the result of human error, in
which case they should be corrected, but they may also be an interesting exception,
and this would deserve further investigation.
Identification of outliers typically consists of an informal inspection of a plot of
the data, but this is unreliable for dimensions greater than two. A formal procedure
for detecting outliers allows for consistency when classifying observations. It also
enables one to automate the detection of outliers by using computers.
The special case of univariate data is treated separately to introduce essential
concepts, and also because it may well be of interest in its own right. We then consider
techniques used for detecting multiple outliers in a multivariate normal sample,
and go on to explain how these may be generalized to include cluster analysis.
Multivariate outlier detection is based on the Minimum Covariance Determinant
(MCD) subset, and is therefore treated in detail. Exact bivariate algorithms were
refined and implemented, and the solutions were used to establish the performance
of the commonly used heuristic, Fast–MCD. / AFRIKAANSE OPSOMMING: Uitskieters word gedefinieer as waarnemings wat tot s´o ’n mate afwyk van die verwagte
gedrag dat die analis wantrouig is oor die oorsprong daarvan. Hierdie waarnemings
mag die resultaat wees van menslike foute, in welke geval dit reggestel moet
word. Dit mag egter ook ’n interressante verskynsel wees wat verdere ondersoek
benodig.
Die identifikasie van uitskieters word tipies informeel deur inspeksie vanaf ’n
grafiese voorstelling van die data uitgevoer, maar hierdie benadering is onbetroubaar
vir dimensies groter as twee. ’n Formele prosedure vir die bepaling van uitskieters
sal meer konsekwente klassifisering van steekproefdata tot gevolg hˆe. Dit gee ook
geleentheid vir effektiewe rekenaar implementering van die tegnieke.
Aanvanklik word die spesiale geval van eenveranderlike data behandel om noodsaaklike
begrippe bekend te stel, maar ook aangesien dit in eie reg ’n area van
groot belang is. Verder word tegnieke vir die identifikasie van verskeie uitskieters in
meerveranderlike, normaal verspreide data beskou. Daar word ook ondersoek hoe
hierdie idees veralgemeen kan word om tros analise in te sluit.
Die sogenaamde Minimum Covariance Determinant (MCD) subversameling is
fundamenteel vir die identifikasie van meerveranderlike uitskieters, en word daarom
in detail ondersoek. Deterministiese tweeveranderlike algoritmes is verfyn en ge¨ımplementeer,
en gebruik om die effektiwiteit van die algemeen gebruikte heuristiese algoritme,
Fast–MCD, te ondersoek.
|
30 |
Novos algoritmos de aprendizado para classificação de padrões utilizando floresta de caminhos ótimos / New learning algorithms for pattern classification using optimum-path forestCastelo Fernández, César Christian 05 November 2011 (has links)
Orientadores: Pedro Jussieu de Rezende, Alexandre Xavier Falcão / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-18T13:40:27Z (GMT). No. of bitstreams: 1
CasteloFernandez_CesarChristian_M.pdf: 2721705 bytes, checksum: 0d764319e69f64e1b806f60bbbf54b92 (MD5)
Previous issue date: 2011 / Resumo: O Reconhecimento de Padrões pode ser definido como a capacidade de identificar a classe de algum objeto dentre um dado conjunto de classes, baseando-se na informação fornecida por amostras conhecidas (conjunto de treinamento). Nesta dissertação, o foco de estudo é o paradigma de classificação supervisionada, no qual se conhece a classe de todas as amostras utilizadas para o projeto do classificador. Especificamente, estuda-se o Classificador baseado em Floresta de Caminhos Ótimos (Optimum-Path Forest - OPF) e propõem três novos algoritmos de aprendizado, os quais representam melhorias em comparação com o Classificador OPF tradicional. Primeiramente, é desenvolvida uma metodologia simples, porém efetiva, para detecção de outliers no conjunto de treinamento. O método visa uma melhoria na acurácia do Classificador OPF tradicional através da troca desses outliers por novas amostras do conjunto de avaliação e sua exclusão do processo de aprendizagem. Os outliers são detectados computando uma penalidade para cada amostra baseada nos seus acertos e erros na classificação, o qual pode ser medido através do número de falsos positivos/negativos e verdadeiros positivos/negativos obtidos por cada amostra. O método obteve uma melhoria na acurácia em comparação com o OPF tradicional, com apenas um pequeno aumento no tempo de treinamento. Em seguida, é proposto um aprimoramento ao primeiro algoritmo, que permite detectar com maior precisão os outliers presentes na base de dados. Neste caso, utiliza-se a informação de falsos positivos/negativos e verdadeiros positivos/negativos de cada amostra para explorar intrinsecamente as relações de adjacência de cada amostra e determinar se é outlier. Uma inovação do método é que não existe necessidade de se computar explicitamente tal adjacência, como é feito nas técnicas tradicionais, o qual pode ser inviável para grandes bases de dados. O método obteve uma boa taxa de detecção de outliers e um tempo de treinamento muito baixo em vista do tamanho das bases de dados utilizadas. Finalmente, é abordado o problema de se selecionar um úmero tão pequeno quanto possível de amostras de treinamento e se obter a maior acurácia possível sobre o conjunto de teste. Propõe-se uma metodologia que se inicia com um pequeno conjunto de treinamento e, através da classificação de um conjunto bem maior de avaliação, aprende quais amostras são as mais representativas para o conjunto de treinamento. Os resultados mostram que é possível obter uma melhor acurácia que o Classificador OPF tradicional ao custo de um pequeno incremento no tempo de treinamento, mantendo, no entanto, o conjunto de treinamento menor que o conjunto inicial, o que significa um tempo de teste reduzido / Abstract: Pattern recognition can be defined as the capacity of identifying the class of an object among a given set of classes, based on the information provided by known samples (training set). In this dissertation, the focus is on the supervised classification approach, for which we are given the classes of all the samples used in the design of the classifier. Specifically, the Optimum-Path Forest Classifier (OPF) is studied and three new learning algorithms are proposed, which represent improvements to the traditional OPF classifier. First of all, a simple yet effective methodology is developed for the detection of outliers in a training set. This method aims at improving OPF's accuracy through the swapping of outliers for new samples from the evaluating set and their exclusion from the learning process itself. Outliers are detected by computing a penalty for each sample based on its classification-hits and -misses, which can be measured through the number of false positive/negatives and true positives/negatives obtained by each sample. The method achieved an accuracy improvement over the traditional OPF, with just a slight increment in the training time. An improvement to the first algorithm is proposed, allowing for a more precise detection of outliers present in the dataset. In this case, the information on the number of false positive/negatives and true positives/negatives of each sample is used to explore the adjacency relations of each sample and determine whether it is an outlier. The method's merit is that there is no need of explicitly computing an actual vicinity, as the traditional techniques do, which could be infeasible for large datasets. The method achieves a good outlier detection rate and a very low training time, considering the size of the datasets. Finally, the problem of choosing a small number of training samples while achieving a high accuracy in the testing set is addressed. We propose a methodology which starts with a small training set and, through the classification of a much larger evaluating set, it learns which are the most representative samples for the training set. The results show that it is possible to achieve higher accuracy than the traditional OPF's at the cost of a slight increment in the training time, preserving, however, a smaller training set than the original one, leading to a lower testing time / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
|
Page generated in 0.1014 seconds