Spelling suggestions: "subject:"outlier"" "subject:"utlier""
21 |
Outlier Detection In Big DataCao, Lei 29 March 2016 (has links)
The dissertation focuses on scaling outlier detection to work both on huge static as well as on dynamic streaming datasets. Outliers are patterns in the data that do not conform to the expected behavior. Outlier detection techniques are broadly applied in applications ranging from credit fraud prevention, network intrusion detection to stock investment tactical planning. For such mission critical applications, a timely response often is of paramount importance. Yet the processing of outlier detection requests is of high algorithmic complexity and resource consuming. In this dissertation we investigate the challenges of detecting outliers in big data -- in particular caused by the high velocity of streaming data, the big volume of static data and the large cardinality of the input parameter space for tuning outlier mining algorithms. Effective optimization techniques are proposed to assure the responsiveness of outlier detection in big data. In this dissertation we first propose a novel optimization framework called LEAP to continuously detect outliers over data streams. The continuous discovery of outliers is critical for a large range of online applications that monitor high volume continuously evolving streaming data. LEAP encompasses two general optimization principles that utilize the rarity of the outliers and the temporal priority relationships among stream data points. Leveraging these two principles LEAP not only is able to continuously deliver outliers with respect to a set of popular outlier models, but also provides near real-time support for processing powerful outlier analytics workloads composed of large numbers of outlier mining requests with various parameter settings. Second, we develop a distributed approach to efficiently detect outliers over massive-scale static data sets. In this big data era, as the volume of the data advances to new levels, the power of distributed compute clusters must be employed to detect outliers in a short turnaround time. In this research, our approach optimizes key factors determining the efficiency of distributed data analytics, namely, communication costs and load balancing. In particular we prove the traditional frequency-based load balancing assumption is not effective. We thus design a novel cost-driven data partitioning strategy that achieves load balancing. Furthermore, we abandon the traditional one detection algorithm for all compute nodes approach and instead propose a novel multi-tactic methodology which adaptively selects the most appropriate algorithm for each node based on the characteristics of the data partition assigned to it. Third, traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the most appropriate parameter setting or on the desired results. We thus design an interactive outlier exploration paradigm that is not only able to answer traditional outlier detection requests in near real-time, but also offers innovative outlier analytics tools to assist analysts to quickly extract, interpret and understand the outliers of interest. Our experimental studies including performance evaluation and user studies conducted on real world datasets including stock, sensor, moving object, and Geolocation datasets confirm both the effectiveness and efficiency of the proposed approaches.
|
22 |
From Outlaw to Outlier: The Role of Teacher Attachment Style in Addressing Student Behavior Problems in KindergartenDurkee, Wendy L 01 December 2017 (has links)
The purpose of this study was to add to the understanding of how teachers impact the emotional and behavioral development of kindergartners. This study looked at teacher beliefs, internal thought patterns about a student whose emotion regulation is immature, the behavior is disruptive, and challenging for his or her teacher. It examined multiple aspects of the teacher’s response to the student’s behavior in order answer the questions: Are the strategies used by the teacher for managing disruptive and challenging behavior consistent with her attachment style? How does this affect the academic trajectory of the student?
Based on results of the Student-Teacher Relationship Scale (STRS) and the Teacher Relationship Interview (TRI), the primary findings of the study indicate that most of the teachers participating in the study were engaging with a challenging student from a secure attachment classification. The STRS provided information about the teacher’s concern for the ability of the student to make an adequate adjustment to school. Those students with high conflict and low total scores were most likely to have behavior problems in 2nd grade. Also, the level of stress produced by the highly conflictual relationship was at times destabilizing for the teacher. Depending on whether the attachment status of the teacher was secure-continuous, secure-earned, or insecure, the ability of the teacher to be resilient in the face of the stress was affected.
|
23 |
Performance Enhancement of Bearing Navigation to Known Radio Beacons / Prestandaförbättring av navigering efter bäring mot kända radiofyrarErkstam, Erik, Tjernqvist, Emil January 2012 (has links)
This master thesis investigates the performance of a car navigation system using lateral accelerometers, yaw rate and bearings relative three known radio beacons. Accelerometer, gyroscope and position data has been collected by an IMU combined with a GPS receiver, where the IMU was installed in the approximate motion center of a car. The bearing measurements are simulated using GPS data and the measurement noise model is derived from an experiment where the direction of arrival to one transmitter was estimated by an antenna array and the signal processing algorithm MUSIC. The measurements are fused in a multi-rate extended Kalman filter which assumes that all measurement noise is Gaussian distributed. This is not the case for the bearing measurement noise which contains outliers and therefore is modelled as a Gaussian uniform noise mixture. Different methods to deal with this have been investigated where the main focus is on the principle to use the Kalman filter’s innovation for each bearing measurement as an indication of its quality and discarding measurements with a quality above a certain threshold.
|
24 |
A robust fit for generalized additive models /Alimadad, Azadeh, January 1900 (has links)
Thesis (M.Sc.) - Carleton University, 2005. / Includes bibliographical references (p. 77-80). Also available in electronic format on the Internet.
|
25 |
Aplicação do teste kens para detecção de outliers em fluxo ótico.MACÊDO, Samuel Victor Medeiros de 01 March 2013 (has links)
Submitted by Luiz Felipe Barbosa (luiz.fbabreu2@ufpe.br) on 2015-03-12T15:05:33Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertaçao Samuel de Macedo.pdf: 2955084 bytes, checksum: 24bf75ae0c8a9d0a76c2baf6850ac907 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T12:59:08Z (GMT) No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertaçao Samuel de Macedo.pdf: 2955084 bytes, checksum: 24bf75ae0c8a9d0a76c2baf6850ac907 (MD5) / Made available in DSpace on 2015-03-13T12:59:08Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertaçao Samuel de Macedo.pdf: 2955084 bytes, checksum: 24bf75ae0c8a9d0a76c2baf6850ac907 (MD5)
Previous issue date: 2013-03-01 / CNPQ
Petrobrás
CHESF / A área de reconstrução 3D tem sido bastante explorada, principalmente nos últimos
anos, com a popularização de ferramentas para visualizar objetos tridimensionais. A
busca por algoritmos e cientes que tornem o pipeline de reconstrução 3D mais e ciente
é alvo de várias pesquisas universitárias e patentes tanto na indústria como na academia.
Atualmente, alguns problemas existentes para reconstrução de malhas que possuem elevado
número de pontos utilizando o pipeline de reconstrução [40] ainda persistem, mesmo
aplicando apenas algumas restrições. Estes problemas são causados pela exigência de elevado
poder computacional exigido pelas técnicas usuais. Dentre essas técnicas estão o
rastreamento de pontos em imagens (feature tracking ) [49] e a geração e avaliação de
várias hipóteses de pose de câmera para encontrar a técnica que melhor se adequa à cena
em questão [37].
A reconstrução 3D pode ser bastante útil em diversas áreas como: realidade aumentada
sem marcadores, para a manipulação de objetos virtuais que interagem sicamente
com o mundo real e o tratamento de oclusão de objetos virtuais por objetos reais. Diante
da problemática e da diversidade de aplicações, alterações no pipeline de reconstrução
3D que o tornem mais rápido e e ciente são interessantes tanto para a área de visão
computacional quanto para a indústria.
No contexto desta problemática, esta dissertação propõe uma metodologia para otimiza-
ção do pipeline de reconstrução 3D explorando os conceitos de inferência estatística, mais
precisamente a área de teste de hipótese. O teste kens é um teste de hipótese estatístico
desenvolvido nesta dissertação para veri car a suavidade de uma trajetória. Este
teste será aplicado aos caminhos das features uma vez que o rastreamento das mesmas
é feito utilizando uxo ótico. Apesar de não ser provado matematicamente que features
inliers percorrem caminhos suaves, este trabalho mostra indícios de uma relação
entre suavidade e inliers, pois com a retirada das features que apresentaram caminhos
não suaves a qualidade da reconstrução 3D apresentou resultados melhores.
Esta dissertação de mestrado descreve todo o ferramental teórico necessário para
entendimento do pipeline de reconstrução 3D e do teste kens. A utilização da técnica em
dois cenários será apresentada: sendo um cenário sintético e o outro real.
|
26 |
Estimating p-values for outlier detectionNorrman, Henrik January 2014 (has links)
Outlier detection is useful in a vast numbers of different domains, wherever there is data and a need for analysis. The research area related to outlier detection is large and the number of available approaches is constantly growing. Most of the approaches produce a binary result: either outlier or not. In this work approaches that are able to detect outliers by producing a p-value estimate are investigated. Approaches that estimate p-values are interesting since it allows their results to easily be compared against each other, followed over time, or be used with a variable threshold. Four approaches are subjected to a variety of tests to attempt to measure their suitability when the data is distributed in a number of ways. The first approach, the R2S, is developed at Halmstad University. Based on finding the mid-point of the data. The second approach is based on one-class support vector machines (OCSVM). The third and fourth approaches are both based on conformal anomaly detection (CAD), but using different nonconformity measures (NCM). The Mahalanobis distance to the mean and a variation of k-NN are used as NCMs. The R2S and the CAD Mahalanobis are both good at estimating p-values from data generated by unimodal and symmetrical distributions. The CAD k-NN is good at estimating p-values when the data is generated by a bimodal or extremely asymmetric distribution. The OCSVM does not excel in any scenario, but produces good average results in most of the tests. The approaches are also subjected to real data, where they all produce comparable results.
|
27 |
The selection of different averaging approaches on whole-body vibration exposure levels of a driver utilising the ISO 2631-1 standardBester, Duane January 2014 (has links)
Limited research has been conducted on inconsistencies relating to whole-body
vibration (WBV) field assessments. Therefore, this study aimed to investigate a certain
possible contributor to inconsistencies in vibration assessment work, namely averaging
intervals. To our knowledge, this was the first study investigating the effect of multiple
averaging approaches on WBV results. WBV parameters were measured for a driver
operating a vehicle on a preselected test route utilising ISO 2631-1:1997. This was
achieved utilizing a Quest HavPro vibration monitor with a fitted tri-axial Integrated
Circuit Piezoelectric (ICP) accelerometer pad mounted on the driver’s seat.
Furthermore, in an attempt to decrease differences between observed WBV results, an
outlier detection method, part of the STATA software package was utilised to clean the
data. Statistical analyses included hypothesis testing in the form of one-way ANOVA
and Kruskal-Wallis one-way analysis of variance by ranks to determine significant
differences between integration intervals. Logged data time-series durations showed a
W0 = 0.04, therefore indicating unequal variance. Omission of 60s from statistical
analyses showed a W0 = 0.28. The observed difference occurs when data is averaged
over longer intervals, resulting in portions of data not being reflected in the final dataset.
In addition, frequency-weighted root mean squared acceleration results reflected
significant differences between 1s, 10s, 30s, 60s and SLOW averaging approaches,
while non-significant differences were observed for crest factors and instantaneous
peak accelerations. Vibration Dose Value results reflected non-significant differences
after omission of 60 second averaging interval data. Cleaned data showed significant
differences between various averaging approaches as well as significant differences
when compared with raw vibration data. The study therefore outlined certain
inconsistencies pertaining to the selection of multiple integration intervals during the
assessment of WBV exposure. Data filtering could not provide a conclusion on a
suitable averaging period and as such, further research is required to determine the
correct averaging interval to be used for WBV assessment. / Dissertation (MPH)--University of Pretoria, 2014. / tm2015 / School of Health Systems and Public Health (SHSPH) / MPH / Unrestricted
|
28 |
Efficient Algorithms for Mining Large Spatio-Temporal DataChen, Feng 21 January 2013 (has links)
Knowledge discovery on spatio-temporal datasets has attracted<br />growing interests. Recent advances on remote sensing technology mean<br />that massive amounts of spatio-temporal data are being collected,<br />and its volume keeps increasing at an ever faster pace. It becomes<br />critical to design efficient algorithms for identifying novel and<br />meaningful patterns from massive spatio-temporal datasets. Different<br />from the other data sources, this data exhibits significant<br />space-time statistical dependence, and the assumption of i.i.d. is<br />no longer valid. The exact modeling of space-time dependence will<br />render the exponential growth of model complexity as the data size<br />increases. This research focuses on the construction of efficient<br />and effective approaches using approximate inference techniques for<br />three main mining tasks, including spatial outlier detection, robust<br />spatio-temporal prediction, and novel applications to real world<br />problems.<br /><br />Spatial novelty patterns, or spatial outliers, are those data points<br />whose characteristics are markedly different from their spatial<br />neighbors. There are two major branches of spatial outlier detection<br />methodologies, which can be either global Kriging based or local<br />Laplacian smoothing based. The former approach requires the exact<br />modeling of spatial dependence, which is time extensive; and the<br />latter approach requires the i.i.d. assumption of the smoothed<br />observations, which is not statistically solid. These two approaches<br />are constrained to numerical data, but in real world applications we<br />are often faced with a variety of non-numerical data types, such as<br />count, binary, nominal, and ordinal. To summarize, the main research<br />challenges are: 1) how much spatial dependence can be eliminated via<br />Laplace smoothing; 2) how to effectively and efficiently detect<br />outliers for large numerical spatial datasets; 3) how to generalize<br />numerical detection methods and develop a unified outlier detection<br />framework suitable for large non-numerical datasets; 4) how to<br />achieve accurate spatial prediction even when the training data has<br />been contaminated by outliers; 5) how to deal with spatio-temporal<br />data for the preceding problems.<br /><br />To address the first and second challenges, we mathematically<br />validated the effectiveness of Laplacian smoothing on the<br />elimination of spatial autocorrelations. This work provides<br />fundamental support for existing Laplacian smoothing based methods.<br />We also discovered a nontrivial side-effect of Laplacian smoothing,<br />which ingests additional spatial variations to the data due to<br />convolution effects. To capture this extra variability, we proposed<br />a generalized local statistical model, and designed two fast forward<br />and backward outlier detection methods that achieve a better balance<br />between computational efficiency and accuracy than most existing<br />methods, and are well suited to large numerical spatial datasets.<br /><br />We addressed the third challenge by mapping non-numerical variables<br />to latent numerical variables via a link function, such as logit<br />function used in logistic regression, and then utilizing<br />error-buffer artificial variables, which follow a Student-t<br />distribution, to capture the large valuations caused by outliers. We<br />proposed a unified statistical framework, which integrates the<br />advantages of spatial generalized linear mixed model, robust spatial<br />linear model, reduced-rank dimension reduction, and Bayesian<br />hierarchical model. A linear-time approximate inference algorithm<br />was designed to infer the posterior distribution of the error-buffer<br />artificial variables conditioned on observations. We demonstrated<br />that traditional numerical outlier detection methods can be directly<br />applied to the estimated artificial variables for outliers<br />detection. To the best of our knowledge, this is the first<br />linear-time outlier detection algorithm that supports a variety of<br />spatial attribute types, such as binary, count, ordinal, and<br />nominal.<br /><br />To address the fourth and fifth challenges, we proposed a robust<br />version of the Spatio-Temporal Random Effects (STRE) model, namely<br />the Robust STRE (R-STRE) model. The regular STRE model is a recently<br />proposed statistical model for large spatio-temporal data that has a<br />linear order time complexity, but is not best suited for<br />non-Gaussian and contaminated datasets. This deficiency can be<br />systemically addressed by increasing the robustness of the model<br />using heavy-tailed distributions, such as the Huber, Laplace, or<br />Student-t distribution to model the measurement error, instead of<br />the traditional Gaussian. However, the resulting R-STRE model<br />becomes analytical intractable, and direct application of<br />approximate inferences techniques still has a cubic order time<br />complexity. To address the computational challenge, we reformulated<br />the prediction problem as a maximum a posterior (MAP) problem with a<br />non-smooth objection function, transformed it to a equivalent<br />quadratic programming problem, and developed an efficient<br />interior-point numerical algorithm with a near linear order<br />complexity. This work presents the first near linear time robust<br />prediction approach for large spatio-temporal datasets in both<br />offline and online cases. / Ph. D.
|
29 |
A Time Series Approach to Removing Outlying Data Points from Bluetooth Vehicle Speed DataRoth, Jennifer M. 13 December 2010 (has links)
No description available.
|
30 |
Multivariate Functional Data Analysis and VisualizationQu, Zhuo 11 1900 (has links)
As a branch of statistics, functional data analysis (FDA) studies observations
regarded as curves, surfaces, or other objects evolving over a continuum. Although
one has seen a flourishing of methods and theories on FDA, two issues
are observed. Firstly, the functional data are sampled from common time grids;
secondly, methods developed only for univariate functional data are challenging
to be applied to multivariate functional data. After exploring model-based fitting
for regularly observed multivariate functional data, we explore new visualization
tools, clustering, and multivariate functional depths for irregularly observed
(sparse) multivariate functional data. The four main chapters that comprise the
dissertation are organized as follows. First, median polish for functional multivariate
analysis of variance (FMANOVA) is proposed with the implementation of
multivariate functional depths in Chapter 2. Numerical studies and environmental
datasets are considered to illustrate the robustness of median polish. Second, the
sparse functional boxplot and the intensity sparse functional boxplot, as practical
exploratory tools that make visualization possible for both complete and sparse
functional data, are introduced in Chapter 3. These visualization tools depict
sparseness characteristics in the proportion of sparseness and relative intensity
of fitted sparse points inside the central region, respectively. Third, a robust
distance-based robust two-layer partition (RTLP) clustering of sparse multivariate
functional data is introduced in Chapter 4. The RTLP clustering is based
on our proposed elastic time distance (ETD) specifically for sparse multivariate
functional data. Lastly, the multivariate functional integrated depth and the multivariate
functional extremal depth based on multivariate depths are proposed in
Chapter 5. Global and local formulas for each depth are explored, with theoretical
properties being proved and the finite sample depth estimation for irregularly
observed multivariate functional data being investigated. In addition, the simplified
sparse functional boxplot and simplified intensity sparse functional boxplot for
visualization without data reconstruction are introduced. Together, these four
extensions to multivariate functional data make them more general and of applicational
interest in exploratory multivariate functional data analysis.
|
Page generated in 0.03 seconds