Global ETD Search

21	Outlier Detection In Big Data Cao, Lei 29 March 2016 (has links) The dissertation focuses on scaling outlier detection to work both on huge static as well as on dynamic streaming datasets. Outliers are patterns in the data that do not conform to the expected behavior. Outlier detection techniques are broadly applied in applications ranging from credit fraud prevention, network intrusion detection to stock investment tactical planning. For such mission critical applications, a timely response often is of paramount importance. Yet the processing of outlier detection requests is of high algorithmic complexity and resource consuming. In this dissertation we investigate the challenges of detecting outliers in big data -- in particular caused by the high velocity of streaming data, the big volume of static data and the large cardinality of the input parameter space for tuning outlier mining algorithms. Effective optimization techniques are proposed to assure the responsiveness of outlier detection in big data. In this dissertation we first propose a novel optimization framework called LEAP to continuously detect outliers over data streams. The continuous discovery of outliers is critical for a large range of online applications that monitor high volume continuously evolving streaming data. LEAP encompasses two general optimization principles that utilize the rarity of the outliers and the temporal priority relationships among stream data points. Leveraging these two principles LEAP not only is able to continuously deliver outliers with respect to a set of popular outlier models, but also provides near real-time support for processing powerful outlier analytics workloads composed of large numbers of outlier mining requests with various parameter settings. Second, we develop a distributed approach to efficiently detect outliers over massive-scale static data sets. In this big data era, as the volume of the data advances to new levels, the power of distributed compute clusters must be employed to detect outliers in a short turnaround time. In this research, our approach optimizes key factors determining the efficiency of distributed data analytics, namely, communication costs and load balancing. In particular we prove the traditional frequency-based load balancing assumption is not effective. We thus design a novel cost-driven data partitioning strategy that achieves load balancing. Furthermore, we abandon the traditional one detection algorithm for all compute nodes approach and instead propose a novel multi-tactic methodology which adaptively selects the most appropriate algorithm for each node based on the characteristics of the data partition assigned to it. Third, traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the most appropriate parameter setting or on the desired results. We thus design an interactive outlier exploration paradigm that is not only able to answer traditional outlier detection requests in near real-time, but also offers innovative outlier analytics tools to assist analysts to quickly extract, interpret and understand the outliers of interest. Our experimental studies including performance evaluation and user studies conducted on real world datasets including stock, sensor, moving object, and Geolocation datasets confirm both the effectiveness and efficiency of the proposed approaches. big data outlier detection data stream distributed algorithm data analytics
22	From Outlaw to Outlier: The Role of Teacher Attachment Style in Addressing Student Behavior Problems in Kindergarten Durkee, Wendy L 01 December 2017 (has links) The purpose of this study was to add to the understanding of how teachers impact the emotional and behavioral development of kindergartners. This study looked at teacher beliefs, internal thought patterns about a student whose emotion regulation is immature, the behavior is disruptive, and challenging for his or her teacher. It examined multiple aspects of the teacher’s response to the student’s behavior in order answer the questions: Are the strategies used by the teacher for managing disruptive and challenging behavior consistent with her attachment style? How does this affect the academic trajectory of the student? Based on results of the Student-Teacher Relationship Scale (STRS) and the Teacher Relationship Interview (TRI), the primary findings of the study indicate that most of the teachers participating in the study were engaging with a challenging student from a secure attachment classification. The STRS provided information about the teacher’s concern for the ability of the student to make an adequate adjustment to school. Those students with high conflict and low total scores were most likely to have behavior problems in 2nd grade. Also, the level of stress produced by the highly conflictual relationship was at times destabilizing for the teacher. Depending on whether the attachment status of the teacher was secure-continuous, secure-earned, or insecure, the ability of the teacher to be resilient in the face of the stress was affected. Outlaw Outlier Attachment Teachers Kindergarten Relationship Educational Leadership
23	Performance Enhancement of Bearing Navigation to Known Radio Beacons / Prestandaförbättring av navigering efter bäring mot kända radiofyrar Erkstam, Erik, Tjernqvist, Emil January 2012 (has links) This master thesis investigates the performance of a car navigation system using lateral accelerometers, yaw rate and bearings relative three known radio beacons. Accelerometer, gyroscope and position data has been collected by an IMU combined with a GPS receiver, where the IMU was installed in the approximate motion center of a car. The bearing measurements are simulated using GPS data and the measurement noise model is derived from an experiment where the direction of arrival to one transmitter was estimated by an antenna array and the signal processing algorithm MUSIC. The measurements are fused in a multi-rate extended Kalman filter which assumes that all measurement noise is Gaussian distributed. This is not the case for the bearing measurement noise which contains outliers and therefore is modelled as a Gaussian uniform noise mixture. Different methods to deal with this have been investigated where the main focus is on the principle to use the Kalman filter’s innovation for each bearing measurement as an indication of its quality and discarding measurements with a quality above a certain threshold. navigation bearings-only outlier rejection MUSIC GPS EKF WGS84 RT90
24	A robust fit for generalized additive models / Alimadad, Azadeh, January 1900 (has links) Thesis (M.Sc.) - Carleton University, 2005. / Includes bibliographical references (p. 77-80). Also available in electronic format on the Internet.
25	Aplicação do teste kens para detecção de outliers em fluxo ótico. MACÊDO, Samuel Victor Medeiros de 01 March 2013 (has links) Submitted by Luiz Felipe Barbosa (luiz.fbabreu2@ufpe.br) on 2015-03-12T15:05:33Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertaçao Samuel de Macedo.pdf: 2955084 bytes, checksum: 24bf75ae0c8a9d0a76c2baf6850ac907 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T12:59:08Z (GMT) No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertaçao Samuel de Macedo.pdf: 2955084 bytes, checksum: 24bf75ae0c8a9d0a76c2baf6850ac907 (MD5) / Made available in DSpace on 2015-03-13T12:59:08Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertaçao Samuel de Macedo.pdf: 2955084 bytes, checksum: 24bf75ae0c8a9d0a76c2baf6850ac907 (MD5) Previous issue date: 2013-03-01 / CNPQ Petrobrás CHESF / A área de reconstrução 3D tem sido bastante explorada, principalmente nos últimos anos, com a popularização de ferramentas para visualizar objetos tridimensionais. A busca por algoritmos e cientes que tornem o pipeline de reconstrução 3D mais e ciente é alvo de várias pesquisas universitárias e patentes tanto na indústria como na academia. Atualmente, alguns problemas existentes para reconstrução de malhas que possuem elevado número de pontos utilizando o pipeline de reconstrução [40] ainda persistem, mesmo aplicando apenas algumas restrições. Estes problemas são causados pela exigência de elevado poder computacional exigido pelas técnicas usuais. Dentre essas técnicas estão o rastreamento de pontos em imagens (feature tracking ) [49] e a geração e avaliação de várias hipóteses de pose de câmera para encontrar a técnica que melhor se adequa à cena em questão [37]. A reconstrução 3D pode ser bastante útil em diversas áreas como: realidade aumentada sem marcadores, para a manipulação de objetos virtuais que interagem sicamente com o mundo real e o tratamento de oclusão de objetos virtuais por objetos reais. Diante da problemática e da diversidade de aplicações, alterações no pipeline de reconstrução 3D que o tornem mais rápido e e ciente são interessantes tanto para a área de visão computacional quanto para a indústria. No contexto desta problemática, esta dissertação propõe uma metodologia para otimiza- ção do pipeline de reconstrução 3D explorando os conceitos de inferência estatística, mais precisamente a área de teste de hipótese. O teste kens é um teste de hipótese estatístico desenvolvido nesta dissertação para veri car a suavidade de uma trajetória. Este teste será aplicado aos caminhos das features uma vez que o rastreamento das mesmas é feito utilizando uxo ótico. Apesar de não ser provado matematicamente que features inliers percorrem caminhos suaves, este trabalho mostra indícios de uma relação entre suavidade e inliers, pois com a retirada das features que apresentaram caminhos não suaves a qualidade da reconstrução 3D apresentou resultados melhores. Esta dissertação de mestrado descreve todo o ferramental teórico necessário para entendimento do pipeline de reconstrução 3D e do teste kens. A utilização da técnica em dois cenários será apresentada: sendo um cenário sintético e o outro real. Detecção de Outlier Fluxo Ótico Visão Computacional Reconstrução 3D Inferência Estatística
26	Estimating p-values for outlier detection Norrman, Henrik January 2014 (has links) Outlier detection is useful in a vast numbers of different domains, wherever there is data and a need for analysis. The research area related to outlier detection is large and the number of available approaches is constantly growing. Most of the approaches produce a binary result: either outlier or not. In this work approaches that are able to detect outliers by producing a p-value estimate are investigated. Approaches that estimate p-values are interesting since it allows their results to easily be compared against each other, followed over time, or be used with a variable threshold. Four approaches are subjected to a variety of tests to attempt to measure their suitability when the data is distributed in a number of ways. The first approach, the R2S, is developed at Halmstad University. Based on finding the mid-point of the data. The second approach is based on one-class support vector machines (OCSVM). The third and fourth approaches are both based on conformal anomaly detection (CAD), but using different nonconformity measures (NCM). The Mahalanobis distance to the mean and a variation of k-NN are used as NCMs. The R2S and the CAD Mahalanobis are both good at estimating p-values from data generated by unimodal and symmetrical distributions. The CAD k-NN is good at estimating p-values when the data is generated by a bimodal or extremely asymmetric distribution. The OCSVM does not excel in any scenario, but produces good average results in most of the tests. The approaches are also subjected to real data, where they all produce comparable results. p-value outlier cad ocsvm Computer Sciences Datavetenskap (datalogi)
27	The selection of different averaging approaches on whole-body vibration exposure levels of a driver utilising the ISO 2631-1 standard Bester, Duane January 2014 (has links) Limited research has been conducted on inconsistencies relating to whole-body vibration (WBV) field assessments. Therefore, this study aimed to investigate a certain possible contributor to inconsistencies in vibration assessment work, namely averaging intervals. To our knowledge, this was the first study investigating the effect of multiple averaging approaches on WBV results. WBV parameters were measured for a driver operating a vehicle on a preselected test route utilising ISO 2631-1:1997. This was achieved utilizing a Quest HavPro vibration monitor with a fitted tri-axial Integrated Circuit Piezoelectric (ICP) accelerometer pad mounted on the driver’s seat. Furthermore, in an attempt to decrease differences between observed WBV results, an outlier detection method, part of the STATA software package was utilised to clean the data. Statistical analyses included hypothesis testing in the form of one-way ANOVA and Kruskal-Wallis one-way analysis of variance by ranks to determine significant differences between integration intervals. Logged data time-series durations showed a W0 = 0.04, therefore indicating unequal variance. Omission of 60s from statistical analyses showed a W0 = 0.28. The observed difference occurs when data is averaged over longer intervals, resulting in portions of data not being reflected in the final dataset. In addition, frequency-weighted root mean squared acceleration results reflected significant differences between 1s, 10s, 30s, 60s and SLOW averaging approaches, while non-significant differences were observed for crest factors and instantaneous peak accelerations. Vibration Dose Value results reflected non-significant differences after omission of 60 second averaging interval data. Cleaned data showed significant differences between various averaging approaches as well as significant differences when compared with raw vibration data. The study therefore outlined certain inconsistencies pertaining to the selection of multiple integration intervals during the assessment of WBV exposure. Data filtering could not provide a conclusion on a suitable averaging period and as such, further research is required to determine the correct averaging interval to be used for WBV assessment. / Dissertation (MPH)--University of Pretoria, 2014. / tm2015 / School of Health Systems and Public Health (SHSPH) / MPH / Unrestricted UCTD Occupational hygiene Whole-body vibration Averaging HavPro Outlier detection
28	Efficient Algorithms for Mining Large Spatio-Temporal Data Chen, Feng 21 January 2013 (has links) Knowledge discovery on spatio-temporal datasets has attracted<br />growing interests. Recent advances on remote sensing technology mean<br />that massive amounts of spatio-temporal data are being collected,<br />and its volume keeps increasing at an ever faster pace. It becomes<br />critical to design efficient algorithms for identifying novel and<br />meaningful patterns from massive spatio-temporal datasets. Different<br />from the other data sources, this data exhibits significant<br />space-time statistical dependence, and the assumption of i.i.d. is<br />no longer valid. The exact modeling of space-time dependence will<br />render the exponential growth of model complexity as the data size<br />increases. This research focuses on the construction of efficient<br />and effective approaches using approximate inference techniques for<br />three main mining tasks, including spatial outlier detection, robust<br />spatio-temporal prediction, and novel applications to real world<br />problems.<br /><br />Spatial novelty patterns, or spatial outliers, are those data points<br />whose characteristics are markedly different from their spatial<br />neighbors. There are two major branches of spatial outlier detection<br />methodologies, which can be either global Kriging based or local<br />Laplacian smoothing based. The former approach requires the exact<br />modeling of spatial dependence, which is time extensive; and the<br />latter approach requires the i.i.d. assumption of the smoothed<br />observations, which is not statistically solid. These two approaches<br />are constrained to numerical data, but in real world applications we<br />are often faced with a variety of non-numerical data types, such as<br />count, binary, nominal, and ordinal. To summarize, the main research<br />challenges are: 1) how much spatial dependence can be eliminated via<br />Laplace smoothing; 2) how to effectively and efficiently detect<br />outliers for large numerical spatial datasets; 3) how to generalize<br />numerical detection methods and develop a unified outlier detection<br />framework suitable for large non-numerical datasets; 4) how to<br />achieve accurate spatial prediction even when the training data has<br />been contaminated by outliers; 5) how to deal with spatio-temporal<br />data for the preceding problems.<br /><br />To address the first and second challenges, we mathematically<br />validated the effectiveness of Laplacian smoothing on the<br />elimination of spatial autocorrelations. This work provides<br />fundamental support for existing Laplacian smoothing based methods.<br />We also discovered a nontrivial side-effect of Laplacian smoothing,<br />which ingests additional spatial variations to the data due to<br />convolution effects. To capture this extra variability, we proposed<br />a generalized local statistical model, and designed two fast forward<br />and backward outlier detection methods that achieve a better balance<br />between computational efficiency and accuracy than most existing<br />methods, and are well suited to large numerical spatial datasets.<br /><br />We addressed the third challenge by mapping non-numerical variables<br />to latent numerical variables via a link function, such as logit<br />function used in logistic regression, and then utilizing<br />error-buffer artificial variables, which follow a Student-t<br />distribution, to capture the large valuations caused by outliers. We<br />proposed a unified statistical framework, which integrates the<br />advantages of spatial generalized linear mixed model, robust spatial<br />linear model, reduced-rank dimension reduction, and Bayesian<br />hierarchical model. A linear-time approximate inference algorithm<br />was designed to infer the posterior distribution of the error-buffer<br />artificial variables conditioned on observations. We demonstrated<br />that traditional numerical outlier detection methods can be directly<br />applied to the estimated artificial variables for outliers<br />detection. To the best of our knowledge, this is the first<br />linear-time outlier detection algorithm that supports a variety of<br />spatial attribute types, such as binary, count, ordinal, and<br />nominal.<br /><br />To address the fourth and fifth challenges, we proposed a robust<br />version of the Spatio-Temporal Random Effects (STRE) model, namely<br />the Robust STRE (R-STRE) model. The regular STRE model is a recently<br />proposed statistical model for large spatio-temporal data that has a<br />linear order time complexity, but is not best suited for<br />non-Gaussian and contaminated datasets. This deficiency can be<br />systemically addressed by increasing the robustness of the model<br />using heavy-tailed distributions, such as the Huber, Laplace, or<br />Student-t distribution to model the measurement error, instead of<br />the traditional Gaussian. However, the resulting R-STRE model<br />becomes analytical intractable, and direct application of<br />approximate inferences techniques still has a cubic order time<br />complexity. To address the computational challenge, we reformulated<br />the prediction problem as a maximum a posterior (MAP) problem with a<br />non-smooth objection function, transformed it to a equivalent<br />quadratic programming problem, and developed an efficient<br />interior-point numerical algorithm with a near linear order<br />complexity. This work presents the first near linear time robust<br />prediction approach for large spatio-temporal datasets in both<br />offline and online cases. / Ph. D. Spatio-Temporal Analysis Outlier Detection Robust Prediction Energy Disaggregation
29	A Time Series Approach to Removing Outlying Data Points from Bluetooth Vehicle Speed Data Roth, Jennifer M. 13 December 2010 (has links) No description available. Civil Engineering Transportation Bluetooth travel time time series outlier
30	Multivariate Functional Data Analysis and Visualization Qu, Zhuo 11 1900 (has links) As a branch of statistics, functional data analysis (FDA) studies observations regarded as curves, surfaces, or other objects evolving over a continuum. Although one has seen a flourishing of methods and theories on FDA, two issues are observed. Firstly, the functional data are sampled from common time grids; secondly, methods developed only for univariate functional data are challenging to be applied to multivariate functional data. After exploring model-based fitting for regularly observed multivariate functional data, we explore new visualization tools, clustering, and multivariate functional depths for irregularly observed (sparse) multivariate functional data. The four main chapters that comprise the dissertation are organized as follows. First, median polish for functional multivariate analysis of variance (FMANOVA) is proposed with the implementation of multivariate functional depths in Chapter 2. Numerical studies and environmental datasets are considered to illustrate the robustness of median polish. Second, the sparse functional boxplot and the intensity sparse functional boxplot, as practical exploratory tools that make visualization possible for both complete and sparse functional data, are introduced in Chapter 3. These visualization tools depict sparseness characteristics in the proportion of sparseness and relative intensity of fitted sparse points inside the central region, respectively. Third, a robust distance-based robust two-layer partition (RTLP) clustering of sparse multivariate functional data is introduced in Chapter 4. The RTLP clustering is based on our proposed elastic time distance (ETD) specifically for sparse multivariate functional data. Lastly, the multivariate functional integrated depth and the multivariate functional extremal depth based on multivariate depths are proposed in Chapter 5. Global and local formulas for each depth are explored, with theoretical properties being proved and the finite sample depth estimation for irregularly observed multivariate functional data being investigated. In addition, the simplified sparse functional boxplot and simplified intensity sparse functional boxplot for visualization without data reconstruction are introduced. Together, these four extensions to multivariate functional data make them more general and of applicational interest in exploratory multivariate functional data analysis. clustering median polish multivariate functional data outlier detection robustness visualization

Search results