Spelling suggestions: "subject:"outlier detection"" "subject:"outlier 1detection""
101 |
A multi-wavelength study of a sample of galaxy clusters / Susan WilsonWilson, Susan January 2012 (has links)
In this dissertation we aim to perform a multi-wavelength analysis of galaxy clusters. We discuss
various methods for clustering in order to determine physical parameters of galaxy clusters
required for this type of study. A selection of galaxy clusters was chosen from 4 papers, (Popesso
et al. 2007b, Yoon et al. 2008, Loubser et al. 2008, Brownstein & Mo at 2006) and restricted
by redshift and galactic latitude to reveal a sample of 40 galaxy clusters with 0.0 < z < 0.15.
Data mining using Virtual Observatory (VO) and a literature survey provided some background
information about each of the galaxy clusters in our sample with respect to optical, radio and
X-ray data. Using the Kayes Mixture Model (KMM) and the Gaussian Mixing Model (GMM),
we determine the most likely cluster member candidates for each source in our sample. We compare
the results obtained to SIMBADs method of hierarchy. We show that the GMM provides
a very robust method to determine member candidates but in order to ensure that the right
candidates are chosen we apply a select choice of outlier tests to our sources. We determine
a method based on a combination of GMM, the QQ Plot and the Rosner test that provides a
robust and consistent method for determining galaxy cluster members. Comparison between
calculated physical parameters; velocity dispersion, radius, mass and temperature, and values
obtained from literature show that for the majority of our galaxy clusters agree within 3 range.
Inconsistencies are thought to be due to dynamically active clusters that have substructure or
are undergoing mergers, making galaxy member identi cation di cult. Six correlations between
di erent physical parameters in the optical and X-ray wavelength were consistent with
published results. Comparing the velocity dispersion with the X-ray temperature, we found a
relation of T0:43 as compared to T0:5 obtained from Bird et al. (1995). X-ray luminosity
temperature and X-ray luminosity velocity dispersion relations gave the results LX T2:44
and LX 2:40 which lie within the uncertainty of results given by Rozgacheva & Kuvshinova
(2010). These results all suggest that our method for determining galaxy cluster members is
e cient and application to higher redshift sources can be considered. Further studies on galaxy
clusters with substructure must be performed in order to improve this method. In future work,
the physical parameters obtained here will be further compared to X-ray and radio properties
in order to determine a link between bent radio sources and the galaxy cluster environment. / MSc (Space Physics), North-West University, Potchefstroom Campus, 2013
|
102 |
Multiple Outlier Detection: Hypothesis Tests versus Model Selection by Information CriteriaLehmann, Rüdiger, Lösler, Michael 14 June 2017 (has links) (PDF)
The detection of multiple outliers can be interpreted as a model selection problem. Models that can be selected are the null model, which indicates an outlier free set of observations, or a class of alternative models, which contain a set of additional bias parameters. A common way to select the right model is by using a statistical hypothesis test. In geodesy data snooping is most popular. Another approach arises from information theory. Here, the Akaike information criterion (AIC) is used to select an appropriate model for a given set of observations. The AIC is based on the Kullback-Leibler divergence, which describes the discrepancy between the model candidates. Both approaches are discussed and applied to test problems: the fitting of a straight line and a geodetic network. Some relationships between data snooping and information criteria are discussed. When compared, it turns out that the information criteria approach is more simple and elegant. Along with AIC there are many alternative information criteria for selecting different outliers, and it is not clear which one is optimal.
|
103 |
Independent component analysis and beyond / Independent component analysis and beyondHarmeling, Stefan January 2004 (has links)
'Independent component analysis' (ICA) ist ein Werkzeug der statistischen Datenanalyse und Signalverarbeitung, welches multivariate Signale in ihre Quellkomponenten zerlegen kann. Obwohl das klassische ICA Modell sehr nützlich ist, gibt es viele Anwendungen, die Erweiterungen von ICA erfordern. In dieser Dissertation präsentieren wir neue Verfahren, die die Funktionalität von ICA erweitern:
(1) Zuverlässigkeitsanalyse und Gruppierung von unabhängigen Komponenten durch Hinzufügen von Rauschen,
(2) robuste und überbestimmte ('over-complete') ICA durch Ausreissererkennung, und
(3) nichtlineare ICA mit Kernmethoden. / Independent component analysis (ICA) is a tool for statistical data analysis and signal processing that is able to decompose multivariate signals into their underlying source components. Although the classical ICA model is highly useful, there are many real-world applications that require powerful extensions of ICA. This thesis presents new methods that extend the functionality of ICA:
(1) reliability and grouping of independent components with noise injection,
(2) robust and overcomplete ICA with inlier detection, and
(3) nonlinear ICA with kernel methods.
|
104 |
運用曲面擬合提升幾何法大地起伏值精度之研究 / The Study of Applying Surface Fitting to Improve Geometric Geoidal Undulation蔡名曜 Unknown Date (has links)
大地起伏值為正高與橢球高的差異量,如果取得高精度的大地起伏值,可以利用衛星定位測量施測橢球高並計算得到高精度的正高,其成本低廉,可望取代傳統的水準測量。而大地起伏值可以分為幾何法或重力法的大地起伏值,其中幾何法的大地起伏值計算方法簡易且精度高,可以利用曲面擬合方法取得之。但是幾何法的大地起伏值會受到地形起伏的影響,大範圍的曲面擬合會降低其精度。台灣的地形起伏大,難以進行大範圍曲面擬合。
於是本研究利用環域方法搜尋待測點位鄰近的水準點參與曲面方程式擬合大地起伏,試圖找到最適合的大地起伏擬合範圍。成果顯示:環域的範圍從10公里至30公里,利用二次曲面方程式擬合大地起伏在台灣平地區域能夠達到預測精度與內部精度同時低於5公分。另外由於衛星定位測量橢球高的誤差較高,需進行資料品質評估並進行粗差偵測。針對粗差偵測提出新的方法,利用最佳化演算法中的量子行為粒子群演算法計算最小二乘平差法中的權矩陣,期望能夠將粗差觀測量的權重降低,達到粗差偵測的效果。成果顯示最佳化權矩陣演算法,能夠將粗差對平差系統的影響量降到最低。
本研究建立一套台灣地區的大地起伏擬合作業程序:利用環域搜尋鄰近水準點、曲面方程式及環域範圍選擇與資料的粗差偵測,可獲得高品質的大地起伏。 / The geoidal undulation is the difference of ellipsoid height and orthometric height. We can obtain high accuracy of orthometric height by existing high accuracy of geoidal undulation and the ellipsoidal height measuring by GPS. It expected to replace the traditional leveling survey due to the less cost.
This study uses buffer method to search the leveling benchmarks around the object point, attempts to find the proper range of fitting geoidal undulation to curve surface. Experimental results shows that it can archive 5cm level on both prediction error and internal precision by fitting geoidal undulation on 2nd curve surface model where the buffer range is from 10 km to 30 km. In this study, also uses the quantum-behaved particle swarm optimization to calculate the weight matrix of least square adjustment, the purpose is to down-weighting the suspicious outlier, and detect the outlier. Experimental results shows that the optimal weight matrix algorithm can reduce the influence of outlier.
This study establish a procedure of fitting geoidal undulation: using buffer analysis to search the adjacent leveling benchmark, selecting the proper buffer range and surface equation and detecting outlier in data.
|
105 |
PCA för detektering av avvikande händelser i en kraftvärmeprocess / PCA for outlier detection in a CHP plantKönigsson, Sofia January 2018 (has links)
Panna 6 på Högdalenverket i södra Stockholm (P6) med tillhörande ångturbin producerar kraftvärme genom förbränning av utsorterat returbränsle från industri och samhälle. För att minimera underhållskostnader och öka anläggningens tillgänglighet är det viktigt att fel och oönskat processbeteende kan upptäckas i ett tidigt skede. I detta syfte testas här en metod för detektering av avvikande händelser med hjälp av principalkomponentanalys (PCA) på produktionsprocessen för kraftvärme. En PCA-modell med reducerad dimension skapas utifrån processdata från en problemfri driftperiod och används som mall för inkommande data att jämföras med i ett kontrolldigram. Avvikelser ifrån modellen bör vara en indikation på att ett onormalt drifttillstånd har uppkommit och orsaker till avvikelsen analyseras. Som avvikande händelse testas två fall av tubläckage som uppstod i ett av tubpaketen för kylning av rökgaserna under 2014 och 2015. Resultatet visar att processavvikelser ifrån normallägesmodellerna tydligt syns i kontrolldiagrammen vid båda tubläckagen och avvikelserna kan härledas till variabler som är kopplade till tubläckage. Det finns potential för att tillämpa metoden för övervakning av processen, en svårighet ligger i att skapa en modell som representerar processen när den är stabil på grund av att det finns många varierande driftfall som anses stabila, detta kräver vidare arbete. Metoden kan redan användas som analysverktyg exempelvis vid misstanke om tubläckage. / Boiler 6 at the Högdalen facility in southern Stockholm (P6) combined with a a steam turbine produces Combined Heat and Power (CHP) through combustion of treated industry waste. In order to minimise maintenance costs and increase plant availability it is of importance to detect process faults and deviations at an early state. In this study a method for outlier detection using Principal Component Analysis (PCA) is applied on the CHP production process. A PCA model with reduced dimension is created using process data from a problem free period and is used as a template for new operating data to be compared with in a control chart. Deviations from the model should be an indication of the presence of abnormal conditions and the reasons for the deviations are analysed. Two cases of tube failure in 2014 and 2015 are used to study the deviations. The result shows that process deviations from the models can be detected in the control chart in both cases of tube failure and the variables known to be associated with tube failure contributes highly to the deviating behaviour. There is potential for applying this method for process control, a difficulty lies in creating a model that represents the stable process when there are big variances within what is considererd a stable process state. The method can be used for data analysis when suspecting a tube failure.
|
106 |
Dolování neobvyklého chování v datech trajektorií / Mining Anomalous Behaviour in Trajectory DataKoňárek, Petr January 2017 (has links)
The goal of this work is to provide an overview of approaches for mining anomalous behavior in trajectory data. Next part is proposes a mining task for outliner detection in trajectories and selects appropriate methods for this task. Selected methods are implemented as application for outliner trajectories detection.
|
107 |
Multiple Outlier Detection: Hypothesis Tests versus Model Selection by Information CriteriaLehmann, Rüdiger, Lösler, Michael January 2016 (has links)
The detection of multiple outliers can be interpreted as a model selection problem. Models that can be selected are the null model, which indicates an outlier free set of observations, or a class of alternative models, which contain a set of additional bias parameters. A common way to select the right model is by using a statistical hypothesis test. In geodesy data snooping is most popular. Another approach arises from information theory. Here, the Akaike information criterion (AIC) is used to select an appropriate model for a given set of observations. The AIC is based on the Kullback-Leibler divergence, which describes the discrepancy between the model candidates. Both approaches are discussed and applied to test problems: the fitting of a straight line and a geodetic network. Some relationships between data snooping and information criteria are discussed. When compared, it turns out that the information criteria approach is more simple and elegant. Along with AIC there are many alternative information criteria for selecting different outliers, and it is not clear which one is optimal.
|
108 |
Anomaly Detection for Portfolio Risk Management : An evaluation of econometric and machine learning based approaches to detecting anomalous behaviour in portfolio risk measures / Avvikelsedetektering för Riskhantering av Portföljer : En utvärdering utav ekonometriska och maskininlärningsbaserade tillvägagångssätt för att detektera avvikande beteende hos portföljriskmåttWesterlind, Simon January 2018 (has links)
Financial institutions manage numerous portfolios whose risk must be managed continuously, and the large amounts of data that has to be processed renders this a considerable effort. As such, a system that autonomously detects anomalies in the risk measures of financial portfolios, would be of great value. To this end, the two econometric models ARMA-GARCH and EWMA, and the two machine learning based algorithms LSTM and HTM, were evaluated for the task of performing unsupervised anomaly detection on the streaming time series of portfolio risk measures. Three datasets of returns and Value-at-Risk series were synthesized and one dataset of real-world Value-at-Risk series had labels handcrafted for the experiments in this thesis. The results revealed that the LSTM has great potential in this domain, due to an ability to adapt to different types of time series and for being effective at finding a wide range of anomalies. However, the EWMA had the benefit of being faster and more interpretable, but lacked the ability to capture anomalous trends. The ARMA-GARCH was found to have difficulties in finding a good fit to the time series of risk measures, resulting in poor performance, and the HTM was outperformed by the other algorithms in every regard, due to an inability to learn the autoregressive behaviour of the time series. / Finansiella institutioner hanterar otaliga portföljer vars risk måste hanteras kontinuerligt, och den stora mängden data som måste processeras gör detta till ett omfattande uppgift. Därför skulle ett system som autonomt kan upptäcka avvikelser i de finansiella portföljernas riskmått, vara av stort värde. I detta syftet undersöks två ekonometriska modeller, ARMA-GARCH och EWMA, samt två maskininlärningsmodeller, LSTM och HTM, för ändamålet att kunna utföra så kallad oövervakad avvikelsedetektering på den strömande tidsseriedata av portföljriskmått. Tre dataset syntetiserades med avkastningar och Value-at-Risk serier, och ett dataset med verkliga Value-at-Risk serier fick handgjorda etiketter till experimenten i denna avhandling. Resultaten visade att LSTM har stor potential i denna domänen, tack vare sin förmåga att anpassa sig till olika typer av tidsserier och för att effektivt lyckas finna varierade sorters anomalier. Däremot så hade EWMA fördelen av att vara den snabbaste och enklaste att tolka, men den saknade förmågan att finna avvikande trender. ARMA-GARCH hade svårigheter med att modellera tidsserier utav riskmått, vilket resulterade i att den preseterade dåligt. HTM blev utpresterad utav de andra algoritmerna i samtliga hänseenden, på grund utav dess oförmåga att lära sig tidsserierna autoregressiva beteende.
|
109 |
Market Surveillance Using Empirical Quantile Model and Machine Learning / Marknadsövervakning med hjälp av empirisk kvantilmodell och maskininlärningLandberg, Daniel January 2022 (has links)
In recent years, financial trading has become more available. This has led to more market participants and more trades taking place each day. The increased activity also implies an increasing number of abusive trades. To detect the abusive trades, market surveillance systems are developed and used. In this thesis, two different methods were tested to detect these abusive trades on high-dimensional data. One was based on empirical quantiles, and the other was based on an unsupervised machine learning technique called isolation forest. The empirical quantile method uses empirical quantiles on dimensionally reduced data to determine if a datapoint is an outlier or not. Principal Component Analysis (PCA) is used to reduce the dimensionality of the data and handle the correlation between features.Isolation forest is a machine learning method that detects outliers by sorting each datapoint in a tree structure. If a datapoint is close to the root, it is more likely to be an outlier. Isolation forest have been proven to detect outliers in high-dimensional datasets successfully, but have not been tested before for market surveillance. The performance of both the quantile method and isolation forest was tested by using recall and run-time. The conclusion was that the empirical quantile method did not detect outliers accurately when all dimensions of the data were used. The method most likely suffered from the curse of dimensionality and could not handle high dimensional data. However, the performance increased when the dimensionality was reduced. Isolation forest performed better than the empirical quantile method and detected 99% of all outliers by classifying 226 datapoints as outliers out of a dataset with 184 true outliers and 1882 datapoints. / Under de senaste åren har finansiell handel blivit mer tillgänglig för allmänheten. Detta har lett till fler deltagare på marknaderna och att fler affärer sker varje dag. Den ökade aktiviteten innebär också att de missbruk som förekommer ökar. För att upptäcka otillåtna affärer utvecklas och används marknadsövervakningssystem. I den här avhandlingen testades två olika metoder för att upptäcka dessa missbruk utifrån högdimensionell data. Den ena baserades på empiriska kvantiler och den andra baserades på en oövervakad maskininlärningsteknik som kallas isolationsskog. Den empiriska kvantilmetoden använder empiriska kvantiler på dimensionellt reducerad data för att avgöra om en datapunkt är ett extremvärde eller inte. För att reducera dimensionen av datan, och för att hantera korrelationen mellan variabler, används huvudkomponent analys (HKA).Isolationsskog är en maskininlärnings metod som upptäcker extremvärden genom att sortera varje datapunkt i en trädstruktur. Om en datapunkt är nära roten är det mer sannolikt att det är en extremvärde. Isolationsskog har visat sig framgångsrikt upptäcka extremvärden i högdimensionella datauppsättningar, men har inte testats för marknadsövervakning tidigare. För att mäta prestanda för båda metoderna användes recall och körtid. Slutsatsen är att den empiriska kvantilmetoden inte hittade extremvärden när alla dimensioner av datan användes. Metoden led med största sannolikhet av dimensionalitetens förbannelse och kunde inte hantera högdimensionell data, men när dimensionaliteten reducerades ökade prestandan. Isolationsskog presterade bättre än den empiriska kvantilmetoden och lyckades detektera 99% av alla extremvärden genom att klassificera 226 datapunkter som extremvärden ur ett dataset med 184 verkliga extremvärden och 1882 datapunkter.
|
110 |
Exogenous Fault Detection in Aerial Swarms of UAVs / Exogen Feldetektering i Svärmar med UAV:erWestberg, Maja January 2023 (has links)
In this thesis, the main focus is to formulate and test a suitable model forexogenous fault detection in swarms containing unmanned aerial vehicles(UAVs), which are aerial autonomous systems. FOI Swedish DefenseResearch Agency provided the thesis project and research question. Inspiredby previous work, the implementation use behavioral feature vectors (BFVs)to simulate the movements of the UAVs and to identify anomalies in theirbehaviors. The chosen algorithm for fault detection is the density-based cluster analysismethod known as the Local Outlier Factor (LOF). This method is built on thek-Nearest Neighbor(kNN) algorithm and employs densities to detect outliers.In this thesis, it is implemented to detect faulty agents within the swarm basedon their behavior. A confusion matrix and some associated equations are usedto evaluate the accuracy of the method. Six features are selected for examination in the LOF algorithm. The firsttwo features assess the number of neighbors in a circle around the agent,while the others consider traversed distance, height, velocity, and rotation.Three different fault types are implemented and induced in one of the agentswithin the swarm. The first two faults are motor failures, and the last oneis a sensor failure. The algorithm is successfully implemented, and theevaluation of the faults is conducted using three different metrics. Several setsof experiments are performed to assess the optimal value for the LOF thresholdand to understand the model’s performance. The thesis work results in a strongLOF value which yields an acceptable F1 score, signifying the accuracy of theimplementation is at a satisfactory level. / I denna uppsats är huvudfokuset att formulera och testa en lämplig modellför detektion av exogena fel i svärmar som innehåller obemannade flygfordon(UAV:er), vilka utgör autonoma luftburna system. Examensarbetet ochforskningsfrågan tillhandahölls av FOI, Totalförsvarets forskningsinstitut.Inspirerad av tidigare arbete används beteendemässiga egenskapsvektorer(BFV:er) för att simulera rörelserna hos UAV:erna och för att identifieraavvikelser i deras beteenden. Den valda algoritmen för felavkänning är en densitetsbaserad klusterana-lysmetod som kallas Local Outlier Factor (LOF). Denna metod byggerpå k-Nearest Neighbor-algoritmen och använder densiteter för att upptäckaavvikande datapunkter. I denna uppsats implementeras den för att detekterafelaktiga agenter inom svärmen baserat på deras beteende. En förväxlings-matris(Confusion Matrix) och dess tillhörande ekvationer används för attutvärdera metodens noggrannhet. Sex egenskaper valdes för undersökning i LOF-algoritmen. De första tvåegenskaperna bedömer antalet grannar i en cirkel runt agenter, medande andra beaktar avstånd, höjd, hastighet och rotation. Tre olika feltyperimplementeras och framkallas hos en av agenterna inom svärmen. De förstatvå felen är motorfel, och det sista är ett sensorfel. Algoritmen implementerasframgångsrikt och utvärderingen av felen genomförs med hjälp av treolika mått. Ett antal uppsättningar av experiment utförs för att hitta detoptimala värdet för LOF-gränsen och för att förstå modellens prestanda.Examensarbetet resultat är ett optimalt LOF-värde som genererar ettacceptabelt F1-score, vilket innebär att noggrannheten för implementationennår en tillfredsställande nivå.
|
Page generated in 0.0723 seconds